This patch includes an initial implementation of a new multi-threaded
runtime. The new runtime aims to increase the scheduler throughput by
speeding up how it dispatches work to peer worker threads. This
implementation improves most benchmarks by about ~10% when the number of
threads is below 16. As threads increase, mutex contention deteriorates
performance.
Because the new scheduler is not yet ready to replace the old one, the
patch introduces it as an unstable runtime flavor with a warning that it
isn't production ready. Work to improve the scalability of the runtime
will most likely require more intrusive changes across Tokio, so I am
opting to merge with master to avoid larger conflicts.
As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first
before checking the local queue.
Ping-ping style workloads where task A notifies task B, which
notifies task A again, can cause starvation as these two tasks
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is relatively high in situations where tasks do not perform many
async operations yet have meaningful poll times (even 5-10 microsecond
poll times can have an outsized impact on the scheduler).
In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the times
the LIFO slot is prioritized per scheduler tick.
In the multi-threaded scheduler, when there are no tasks on the local
queue, a worker will attempt to pull tasks from the injection queue.
Previously, the worker would only attempt to poll one task from the
injection queue then continue trying to find work from other sources.
This can result in the injection queue backing up when there are many
tasks being scheduled from outside of the runtime.
This patch updates the worker to try to poll more than one task from the
injection queue when it has no more local work. Note that we also don't
want a single worker to poll **all** tasks on the injection queue as
that would result in work becoming unbalanced.
- Use dtolnay/rust-toolchain instead of actions-rs/toolchain
- Use cargo/cross directly instead of actions-rs/cargo
- Use rustsec/audit-check instead of actions-rs/audit-check
Previously, calling `task::yield_now().await` would yield the current
task to the scheduler, but the scheduler would poll it again before
polling the resource drivers. This behavior can result in starving the
resource drivers.
This patch creates a queue tracking yielded tasks. The scheduler
notifies those tasks **after** polling the resource drivers.
Refs: #5209
When backporting patches to LTS branches, we often run into CI failures due to
changes in rust. Newer rust versions add more lints, which break CI. We really
don't want to also have to backport patches that fix CI, so instead, LTS branches
should pin the stable rust version in CI (e.g. #4434).
This PR restructures the CI config files to make it a bit easier to set a specific rust
version in CI.