Currently, Tokio runs cross-compilation checks for the
`mips-unknown-linux-gnu` and `mipsel-unknown-linux-musl` target triples.
However, Rust has recently demoted these targets from Tier 2 support to
Tier 3 (see rust-lang/compiler-team#648). Therefore, MIPS toolchains may
not always be available, even in stable releases. This is currently
[breaking our CI builds][1], as Rust 1.72.0 does not contain a standard
library for `mips-unknown-linux-gnu`.
This branch removes these builds from the cross-compilation check's
build matrix. Tokio may still build successfully for MIPS targets, but
we can't easily guarantee support when the stable Rust release train may
or may not be able to build for MIPS targets.
[1]: https://github.com/tokio-rs/tokio/actions/runs/5970263562/job/16197657405?pr=5947#step:3:80
Since MSRV is bumped to 1.63, `Mutex::new` is now usable in const context.
Also use `assert!` in const function to ensure correctness instead of
silently truncating the value and remove cfg `tokio_no_const_mutex_new`.
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
This patch includes an initial implementation of a new multi-threaded
runtime. The new runtime aims to increase the scheduler throughput by
speeding up how it dispatches work to peer worker threads. This
implementation improves most benchmarks by about ~10% when the number of
threads is below 16. As threads increase, mutex contention deteriorates
performance.
Because the new scheduler is not yet ready to replace the old one, the
patch introduces it as an unstable runtime flavor with a warning that it
isn't production ready. Work to improve the scalability of the runtime
will most likely require more intrusive changes across Tokio, so I am
opting to merge with master to avoid larger conflicts.
- Pass `--no-deps` to `cargo-clippy`
- Use `dtolnay/rust-toolchain@stale` instead of
`dtolnay/rust-toolchain@master`
- Use dtolnay/rust-toolchain instead of `rustup` directly
- Use `cargo-nextest` in job test to speedup testing
- Use `cargo-nextest` in job test-unstable to speedup testing
- Use `cargo-nextest` in job test-unstable-taskdump to speedup testing
- Use `cargo-nextest` in job no-atomic-u64 to speedup testing
- Use `cargo-nextest` in job check-unstable-mt-counters
- Run `cargo check --benches` for `benches/` in job test
Since the benchmark is not run
- Run `cargo-check` instead of `cargo-build` in job test-parking_lot
since no test is run
- Run `cargo-check` instead of `cargo-build` in job no-atomic-u64
- Run `Swatinem/rust-cache@v2` after `taiki-e/install-action@v2` to
avoid caching pre-built artifacts downloaded by it.
- Use `Swatinem/rust-cache@v2` in job no-atomic-u64
- Add concurrenty group to cancel outdated CI
- Use `taiki-e/setup-cross-toolchain-action@v1` in job cross-test
instead of cross, so that we can use `cargo-nextest` to run tests in
parallel.
Also use `Swatinem/rust-cache@v2` to cache artifacts.
- Use `Swatinem/rust-cache@v2` in job cross-check to speedup ci.
- Fix job `cross-test`: Use `armv5te-unknown-linux-gnueabi` for no-atomic-u64
testing instead of `arm-unknown-linux-gnueabihf`, which actually has
atomic-u64
- Rm use of `cross` in job `cross-check`
Since it does not run any test, it does not need the `cross-rs`
toolchain as tokio does not use any external C/C++ library that require
`gcc`/`clang` to compile.
- Add more recognizable name for steps in job cross-test
- Split job `test` into `test-{tokio-full, workspace-all-features,
integration-tests-per-feature}`
- Split job `no-atomic-u64` into `no-atomic-u64-{test, check}`
- Parallelize job `features` by using matrix
- Split `cross-test` into `cross-test-{with, without}-parking_lot`
- Speedup job `cross-test-*` and `no-atomic-u64-test` by running
`cargo-test` with `-- --test-threads 1` since `qemu` userspace
emulation has problems running binaries with many threads.
- Speedup workflow `stress-test.yml` and job `valgrind` in workflow `ci.yml`
by passing `--fair-sched=yes` to `valgrind`.
- Speedup job `test-hyper`: Cache `./hyper/target`
instead of caching `./target`, which is non-existent.
- Set `RUST_TEST_THREADS=1` to make sure `libtest` only use one thread
so that qemu will be happy with the tests.
This is applied to `cross-test-with(out)-parking_lot, no-atomic-u64-test`.
- Apply suggestions from code review
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
The tuning test relies on a predictable execution environment. It
assumes that spawning a new task can complete reasonably fast. When
running tests with ASAN, the tuning test will spurriously fail. After
investigating, I believe this is due to running tests with ASAN enabled
and without `release` in a low resource environment (CI) results in an
execution environment that is too slow for the tuning test to succeed.
PR #5720 introduced runtime self-tuning. It included a test that
attempts to verify self-tuning logic. The test is heavily reliant on
timing details. This patch attempts to make the test a bit more reliable
by not assuming tuning will converge within a set amount of time.
Each multi-threaded runtime worker prioritizes pulling tasks off of its
local queue. Every so often, it checks the injection (global) queue for
work submitted there. Previously, "every so often," was a constant
"number of tasks polled" value. Tokio sets a default of 61, but allows
users to configure this value.
If workers are under load with tasks that are slow to poll, the
injection queue can be starved. To prevent starvation in this case, this
commit implements some basic self-tuning. The multi-threaded scheduler
tracks the mean task poll time using an exponentially-weighted moving
average. It then uses this value to pick an interval at which to check
the injection queue.
This commit is a first pass at adding self-tuning to the scheduler.
There are other values in the scheduler that could benefit from
self-tuning (e.g. the maintenance interval). Additionally, the
current-thread scheduler could also benfit from self-tuning. However, we
have reached the point where we should start investigating ways to unify
logic in both schedulers. Adding self-tuning to the current-thread
scheduler will be punted until after this unification.
As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first
before checking the local queue.
Ping-ping style workloads where task A notifies task B, which
notifies task A again, can cause starvation as these two tasks
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is relatively high in situations where tasks do not perform many
async operations yet have meaningful poll times (even 5-10 microsecond
poll times can have an outsized impact on the scheduler).
In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the times
the LIFO slot is prioritized per scheduler tick.
In the multi-threaded scheduler, when there are no tasks on the local
queue, a worker will attempt to pull tasks from the injection queue.
Previously, the worker would only attempt to poll one task from the
injection queue then continue trying to find work from other sources.
This can result in the injection queue backing up when there are many
tasks being scheduled from outside of the runtime.
This patch updates the worker to try to poll more than one task from the
injection queue when it has no more local work. Note that we also don't
want a single worker to poll **all** tasks on the injection queue as
that would result in work becoming unbalanced.
Task dumps are snapshots of runtime state. Taskdumps are collected by
instrumenting Tokio's leaves to conditionally collect backtraces, which
are then coalesced per-task into execution tree traces.
This initial implementation only supports collecting taskdumps from
within the context of a current-thread runtime, and only `yield_now()`
is instrumented.
These flags were previously only needed due to a bug in the `cargo-semver-checks` CLI logic.
The correct behavior (available as of v0.18.3) for `cargo-semver-checks` is to ignore `publish = false` crates when scanning a workspace, *unless* those crates are specifically selected for checking.
All the crates being excluded here are `publish = false` so they are already excluded by the default behavior, so all `--exclude` flags are no-ops.
Fixes#5373Closes#5358
- Add check for no_atomic_u64 & no_const_mutex_new (condition to atomic_u64_static_once_cell.rs is compiled)
- Allow unused_imports in TARGET_ATOMIC_U64_PROBE. I also tested other *_PROBE and found no other errors triggered by -D warning.
- Fix cfg of util::once_cell module
- Use dtolnay/rust-toolchain instead of actions-rs/toolchain
- Use cargo/cross directly instead of actions-rs/cargo
- Use rustsec/audit-check instead of actions-rs/audit-check
This adds CI coverage for a couple of code paths that are not currently
hit in CI:
* no `const fn Mutex::new`
* no `AtomicU64`
This is done by adding some new CFG flags used only for tests in order
to force those code paths.
Previously, calling `task::yield_now().await` would yield the current
task to the scheduler, but the scheduler would poll it again before
polling the resource drivers. This behavior can result in starving the
resource drivers.
This patch creates a queue tracking yielded tasks. The scheduler
notifies those tasks **after** polling the resource drivers.
Refs: #5209
This patch updates CI to use `cross` to run Tokio tests on virtualized
ARM and i686 VMs. Because ipv6 doesn't work on Github action running in
a docker instance, those tests are disabled