The tuning test relies on a predictable execution environment. It
assumes that spawning a new task can complete reasonably fast. When
running tests with ASAN, the tuning test will spurriously fail. After
investigating, I believe this is due to running tests with ASAN enabled
and without `release` in a low resource environment (CI) results in an
execution environment that is too slow for the tuning test to succeed.
This PR restructures `runtime::context` into multiple files by component and feature flag. The goal is to reduce code defined in macros and make each context component more manageable.
There should be no behavior changes except tweaking how the RNG seed is set. Instead of putting it in `set_current`, we set it when entering the runtime. This aligns better with the feature's original intent, enabling users to make a runtime's RNG deterministic. The seed should not be changed by `Handle::enter()`, so there is no need to have the code in `context::set_current`.
Removes `Send` from `EnterGuard` (returned by `Handle::enter()`. The
guard type changes a thread-local variable on drop. If the guard is
moved to a different thread, it would modify the wrong thread-local.
This is a **breaking change** but it fixes a bug and prevents incorrect
user behavior. If user code breaks because of this, it is because they
(most likely) have a bug in their code.
If the `Scoped` type is `Sync`, then you can call `set` from two threads in parallel. Since it accesses `inner` without synchronization, this is a data race.
This is a soundness issue for the `Scoped` type, but since this is an internal API and we don't use it incorrectly anywhere, no harm is done.
This commit is a step towards the ongoing effort to unify the mutex in
the multi-threaded scheduler. The Inject queue is split into two
structs. `Shared` holds fields that are concurrently accessed, and
`Synced` holds fields that must be locked to access. The multi-threaded
scheduler is responsible for locking `Synced` and passing it in when
needed.
The commit also splits `inject` into multiple files to help reduce the
amount of code defined in macros.
PR #5720 introduced runtime self-tuning. It included a test that
attempts to verify self-tuning logic. The test is heavily reliant on
timing details. This patch attempts to make the test a bit more reliable
by not assuming tuning will converge within a set amount of time.
Previously, `Inject` was defined in `runtime::task`. This was because it
used some internal fns as part of the intrusive linked-list
implementation.
In the future, we want to remove the mutex from Inject and move it to
the scheduler proper (to reduce mutex ops). To set this up, this commit
moves `Inject` to `runtime::scheduler`. To make this work, we have to
`pub(crate)` `task::RawTask` and use that as the interface to access the
next / previous pointers.
Previously, the deferred task list (list of tasks that yielded and are
waiting to be woken) was stored on the global runtime context. Because
the scheduler is responsible for waking these tasks, it took additional
TLS reads to perform the wake operation.
Instead, this commit moves the list of deferred tasks into the scheduler
context. This makes it easily accessible from the scheduler itself.
In order to reduce the number of mutex operations in the multi-threaded
scheduler hot path, we need to unify the various mutexes into a single
one. To start this work, this commit splits up `Idle` into `Idle` and
`Synced`. The `Synced` component is stored separately in the scheduler's
`Shared` structure.
Each multi-threaded runtime worker prioritizes pulling tasks off of its
local queue. Every so often, it checks the injection (global) queue for
work submitted there. Previously, "every so often," was a constant
"number of tasks polled" value. Tokio sets a default of 61, but allows
users to configure this value.
If workers are under load with tasks that are slow to poll, the
injection queue can be starved. To prevent starvation in this case, this
commit implements some basic self-tuning. The multi-threaded scheduler
tracks the mean task poll time using an exponentially-weighted moving
average. It then uses this value to pick an interval at which to check
the injection queue.
This commit is a first pass at adding self-tuning to the scheduler.
There are other values in the scheduler that could benefit from
self-tuning (e.g. the maintenance interval). Additionally, the
current-thread scheduler could also benfit from self-tuning. However, we
have reached the point where we should start investigating ways to unify
logic in both schedulers. Adding self-tuning to the current-thread
scheduler will be punted until after this unification.
This commit eliminates the current_thread::CURRENT and multi_thread::current
thread-local variables in favor of using `runtime::context`. This is another step
towards reducing the total number of thread-local variables used by Tokio.
As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first
before checking the local queue.
Ping-ping style workloads where task A notifies task B, which
notifies task A again, can cause starvation as these two tasks
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is relatively high in situations where tasks do not perform many
async operations yet have meaningful poll times (even 5-10 microsecond
poll times can have an outsized impact on the scheduler).
In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the times
the LIFO slot is prioritized per scheduler tick.
In the multi-threaded scheduler, when there are no tasks on the local
queue, a worker will attempt to pull tasks from the injection queue.
Previously, the worker would only attempt to poll one task from the
injection queue then continue trying to find work from other sources.
This can result in the injection queue backing up when there are many
tasks being scheduled from outside of the runtime.
This patch updates the worker to try to poll more than one task from the
injection queue when it has no more local work. Note that we also don't
want a single worker to poll **all** tasks on the injection queue as
that would result in work becoming unbalanced.
Previously, the current_thread scheduler used its own injection queue
instead of sharing the same one as the multi-threaded scheduler. This
patch updates the current_thread scheduler to use the same injection
queue as the multi-threaded one (`task::Inject`).
`task::Inject` includes an optimization where it does not need to
acquire the mutex if the queue is empty.
These counters are enabled using the `tokio_internal_mt_counters` and
are intended to help with debugging performance issues.
Whenever I work on the threaded runtime, I often find myself adding
these counters, then removing them before submitting a PR. I think
keeping them in will save time in the long run and shouldn't impact dev
much.
Adds support for instrumenting the poll times of all spawned tasks. Data is tracked in a histogram. The user must specify the histogram scale and bucket ranges. Implementation-wise, the same strategy is used in the runtime where we are just using atomic counters. Because instrumenting each poll duration will result in frequent calls to `Instant::now()`, I think it should be an opt-in metric.