1644 Commits

Author SHA1 Message Date
Sean McArthur
2c870b588f process: refactor OrphanQueue to use a Mutex instead fo SegQueue (#1712) 2019-10-30 15:29:04 -07:00
Jon Gjengset
109fd3086b thread-pool: in-place blocking with new scheduler (#1681)
The initial new scheduler PR omitted in-place blocking
support. This patch brings it back.
2019-10-30 08:58:49 -07:00
Sean McArthur
e3261440e5 timer: inline CachePadded type (#1706) 2019-10-29 22:16:11 -07:00
Carl Lerche
2b909d6805
sync: move into tokio crate (#1705)
A step towards collapsing Tokio sub crates into a single `tokio`
crate (#1318).

The sync implementation is now provided by the main `tokio` crate.
Functionality can be opted out of by using the various net related
feature flags.
2019-10-29 15:11:31 -07:00
Carl Lerche
c62ef2d232
executor: move into tokio crate (#1702)
A step towards collapsing Tokio sub crates into a single `tokio`
crate (#1318).

The executor implementation is now provided by the main `tokio` crate.
Functionality can be opted out of by using the various net related
feature flags.
2019-10-28 21:40:29 -07:00
Eliza Weisman
7eb264a0d0
net: replace RwLock<Slab> with a lock free slab (#1625)
## Motivation

The `tokio_net::driver` module currently stores the state associated
with scheduled IO resources in a `Slab` implementation from the `slab`
crate. Because inserting items into and removing items from `slab::Slab`
requires mutable access, the slab must be placed within a `RwLock`. This
has the potential to be a performance bottleneck especially in the context of
the work-stealing scheduler where tasks and the reactor are often located on
the same thread.

`tokio-net` currently reimplements the `ShardedRwLock` type from
`crossbeam` on top of `parking_lot`'s `RwLock` in an attempt to squeeze
as much performance as possible out of the read-write lock around the
slab. This introduces several dependencies that are not used elsewhere.

## Solution

This branch replaces the `RwLock<Slab>` with a lock-free sharded slab
implementation. 

The sharded slab is based on the concept of _free list sharding_
described by Leijen, Zorn, and de Moura in [_Mimalloc: Free List
Sharding in Action_][mimalloc], which describes the implementation of a
concurrent memory allocator. In this approach, the slab is sharded so
that each thread has its own thread-local list of slab _pages_. Objects
are always inserted into the local slab of the thread where the
insertion is performed. Therefore, the insert operation needs not be
synchronized.

However, since objects can be _removed_ from the slab by threads other
than the one on which they were inserted, removal operations can still
occur concurrently. Therefore, Leijen et al. introduce a concept of
_local_ and _global_ free lists. When an object is removed on the same
thread it was originally inserted on, it is placed on the local free
list; if it is removed on another thread, it goes on the global free
list for the heap of the thread from which it originated. To find a free
slot to insert into, the local free list is used first; if it is empty,
the entire global free list is popped onto the local free list. Since
the local free list is only ever accessed by the thread it belongs to,
it does not require synchronization at all, and because the global free
list is popped from infrequently, the cost of synchronization has a
reduced impact. A majority of insertions can occur without any
synchronization at all; and removals only require synchronization when
an object has left its parent thread.

The sharded slab was initially implemented in a separate crate (soon to
be released), vendored in-tree to decrease `tokio-net`'s dependencies.
Some code from the original implementation was removed or simplified,
since it is only necessary to support `tokio-net`'s use case, rather
than to provide a fully generic implementation.

[mimalloc]: https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf

## Performance

These graphs were produced by out-of-tree `criterion` benchmarks of the
sharded slab implementation.


The first shows the results of a benchmark where an increasing number of
items are inserted and then removed into a slab concurrently by five
threads. It compares the performance of the sharded slab implementation
with a `RwLock<slab::Slab>`:

<img width="1124" alt="Screen Shot 2019-10-01 at 5 09 49 PM" src="https://user-images.githubusercontent.com/2796466/66078398-cd6c9f80-e516-11e9-9923-0ed6292e8498.png">

The second graph shows the results of a benchmark where an increasing
number of items are inserted and then removed by a _single_ thread. It
compares the performance of the sharded slab implementation with an
`RwLock<slab::Slab>` and a `mut slab::Slab`.

<img width="925" alt="Screen Shot 2019-10-01 at 5 13 45 PM" src="https://user-images.githubusercontent.com/2796466/66078469-f0974f00-e516-11e9-95b5-f65f0aa7e494.png">

Note that while the `mut slab::Slab` (i.e. no read-write lock) is
(unsurprisingly) faster than the sharded slab in the single-threaded
benchmark, the sharded slab outperforms the un-contended
`RwLock<slab::Slab>`. This case, where the lock is uncontended and only
accessed from a single thread, represents the best case for the current
use of `slab` in `tokio-net`, since the lock cannot be conditionally
removed in the single-threaded case.

These benchmarks demonstrate that, while the sharded approach introduces
a small constant-factor overhead, it offers significantly better
performance across concurrent accesses.

## Notes

This branch removes the following dependencies `tokio-net`:
- `parking_lot`
- `num_cpus`
- `crossbeam_util`
- `slab`

This branch adds the following dev-dependencies:
- `proptest`
- `loom`

Note that these dev dependencies were used to implement tests for the
sharded-slab crate out-of-tree, and were necessary in order to vendor
the existing tests. Alternatively, since the implementation is tested
externally, we _could_ remove these tests in order to avoid picking up
dev-dependencies. However, this means that we should try to ensure that
`tokio-net`'s vendored implementation doesn't diverge significantly from
upstream's, since it would be missing a majority of its tests.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2019-10-28 11:30:45 -07:00
Geoff Shannon
1195263584 Fix docs links: Redux (#1698) 2019-10-27 09:37:07 -07:00
Carl Lerche
bccb713d98
thread-pool: test additional shutdown cases (#1697)
This adds an extra spawned task during the thread-pool shutdown loom
test. This results in additional cases being tested, primarily tasks
being stolen.
2019-10-26 22:15:39 -07:00
Linus Färnstrand
474befd23c chore: use argument position impl trait (#1690) 2019-10-26 08:40:38 -07:00
Carl Lerche
987ba7373c
io: move into tokio crate (#1691)
A step towards collapsing Tokio sub crates into a single `tokio`
crate (#1318).

The `io` implementation is now provided by the main `tokio` crate.
Functionality can be opted out of by using the various net related
feature flags.
2019-10-26 08:02:49 -07:00
Carl Lerche
227533d456
net: move into tokio crate (#1683)
A step towards collapsing Tokio sub crates into a single `tokio`
crate (#1318).

The `net` implementation is now provided by the main `tokio` crate.
Functionality can be opted out of by using the various net related
feature flags.
2019-10-25 12:50:15 -07:00
Jon Gjengset
03a9378297 Make blocking pool non-static and use for thread pool (#1678)
Previously, support for `blocking` was done through a static `POOL` that
would spawn threads on demand. While this made the pool accessible at
all times, it made it hard to configure, and it was impossible to keep
multiple blocking pools.

This patch changes `blocking` to instead use a "default" global like the
ones used for timers, executors, and the like. There is now
`blocking::with_pool`, which is used by both thread-pool workers and the
current-thread runtime to ensure that a pool is available to tasks.

This patch also changes `ThreadPool` to spawn its worker threads on the
blocking pool rather than as free-standing threads. This is in
preparation for the coming in-place blocking work.

One downside of this change is that thread names are no longer
"semantic". All threads are named by the pool name, and individual
threads are not (currently) given names with numerical suffixes like
before.
2019-10-24 14:17:47 -07:00
Carl Lerche
99940aeeb4
chore: remove tracing. (#1680)
Historically, logging has been added haphazardly. Here, we entirely
remove logging as none of it is particularly useful. In the future, we
will add tracing back in order to expose useful data to the user of
Tokio.
2019-10-23 11:04:14 -07:00
Carl Lerche
cfc15617a5
codec: move into tokio-util (#1675)
Related to #1318, Tokio APIs that are "less stable" are moved into a new
`tokio-util` crate. This crate will mirror `tokio` and provide
additional APIs that may require a greater rate of breaking changes.

As examples require `tokio-util`, they are moved into a separate
crate (`examples`). This has the added advantage of being able to avoid
example only dependencies in the `tokio` crate.
2019-10-22 10:13:49 -07:00
Carl Lerche
b8cee1a60a
timer: move tokio-timer into tokio crate (#1674)
A step towards collapsing Tokio sub crates into a single `tokio`
crate (#1318).

The `timer` implementation is now provided by the main `tokio` crate.
The `timer` functionality may still be excluded from the build by
skipping the `timer` feature flag.
2019-10-21 16:45:13 -07:00
Kevin Leimkuhler
c9bcbe77b9
net: Eagerly bind resources to reactors (#1666)
## Motivation

The `tokio_net` resources can be created outside of a runtime due to how tokio
has been used with futures to date. For example, this allows a `TcpStream` to be
created, and later passed into a runtime:

```
let stream = TcpStream::connect(...).and_then(|socket| {
    // do something
});
tokio::run(stream);
```

In order to support this functionality, the reactor was lazily bound to the
resource on the first call to `poll_read_ready`/`poll_write_ready`. This
required a lot of additional complexity in the binding logic to support.

With the tokio 0.2 common case, this is no longer necessary and can be removed.
All resources are expected to be created from within a runtime, and should panic
if not done so.

Closes #1168

## Solution

The `tokio_net` crate now assumes there to be a `CURRENT_REACTOR` set on the
worker thread creating a resource; this can be assumed if called within a tokio
runtime. If there is no current reactor, the application will panic with a "no
current reactor" message.

With this assumption, all the unsafe and atomics have been removed from
`tokio_net::driver::Registration` as it is no longer needed.

There is no longer any reason to pass in handles to the family of `from_std` methods on `net` resources. `Handle::current` has therefore a more restricted private use where it is only used in `driver::Registration::new`.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2019-10-21 16:20:06 -07:00
Carl Lerche
978013a215
fs: move into tokio (#1672)
A step towards collapsing Tokio sub crates into a single `tokio`
crate (#1318).

The `fs` implementation is now provided by the main `tokio` crate. The
`fs` functionality may still be excluded from the build by skipping the
`fs` feature flag.
2019-10-21 15:49:00 -07:00
madmaxio
6aa6ebb5bc io: Take struct re-export to main crate (#1670) 2019-10-21 10:03:05 -07:00
Jonathas Conceição
4bee94eb06 runtime: update doc regarding runtime::run function helper (#1671) 2019-10-21 10:02:36 -07:00
Carl Lerche
ed5a94eb2d
executor: rewrite the work-stealing thread pool (#1657)
This patch is a ground up rewrite of the existing work-stealing thread
pool. The goal is to reduce overhead while simplifying code when
possible.

At a high level, the following architectural changes were made:

- The local run queues were switched for bounded circle buffer queues.
- Reduce cross-thread synchronization.
- Refactor task constructs to use a single allocation and always include
  a join handle (#887).
- Simplify logic around putting workers to sleep and waking them up.

**Local run queues**

Move away from crossbeam's implementation of the Chase-Lev deque. This
implementation included unnecessary overhead as it supported
capabilities that are not needed for the work-stealing thread pool.
Instead, a fixed size circle buffer is used for the local queue. When
the local queue is full, half of the tasks contained in it are moved to
the global run queue.

**Reduce cross-thread synchronization**

This is done via many small improvements. Primarily, an upper bound is
placed on the number of concurrent stealers. Limiting the number of
stealers results in lower contention. Secondly, the rate at which
workers are notified and woken up is throttled. This also reduces
contention by preventing many threads from racing to steal work.

**Refactor task structure**

Now that Tokio is able to target a rust version that supports
`std::alloc` as well as `std::task`, the pool is able to optimize how
the task structure is laid out. Now, a single allocation per task is
required and a join handle is always provided enabling the spawner to
retrieve the result of the task (#887).

**Simplifying logic**

When possible, complexity is reduced in the implementation. This is done
by using locks and other simpler constructs in cold paths. The set of
sleeping workers is now represented as a `Mutex<VecDeque<usize>>`.
Instead of optimizing access to this structure, we reduce the amount the
pool must access this structure.

Secondly, we have (temporarily) removed `threadpool::blocking`. This
capability will come back later, but the original implementation was way
more complicated than necessary.

**Results**

The thread pool benchmarks have improved significantly:

Old thread pool:

```
test chained_spawn ... bench:   2,019,796 ns/iter (+/- 302,168)
test ping_pong     ... bench:   1,279,948 ns/iter (+/- 154,365)
test spawn_many    ... bench:  10,283,608 ns/iter (+/- 1,284,275)
test yield_many    ... bench:  21,450,748 ns/iter (+/- 1,201,337)
```

New thread pool:

```
test chained_spawn ... bench:     147,943 ns/iter (+/- 6,673)
test ping_pong     ... bench:     537,744 ns/iter (+/- 20,928)
test spawn_many    ... bench:   7,454,898 ns/iter (+/- 283,449)
test yield_many    ... bench:  16,771,113 ns/iter (+/- 733,424)
```

Real-world benchmarks improve significantly as well. This is testing the hyper hello
world server using: `wrk -t1 -c50 -d10`:

Old scheduler:

```
Running 10s test @ http://127.0.0.1:3000
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   371.53us   99.05us   1.97ms   60.53%
    Req/Sec   114.61k     8.45k  133.85k    67.00%
  1139307 requests in 10.00s, 95.61MB read
Requests/sec: 113923.19
Transfer/sec:      9.56MB
```

New scheduler:

```
Running 10s test @ http://127.0.0.1:3000
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   275.05us   69.81us   1.09ms   73.57%
    Req/Sec   153.17k    10.68k  171.51k    71.00%
  1522671 requests in 10.00s, 127.79MB read
Requests/sec: 152258.70
Transfer/sec:     12.78MB
```
2019-10-19 11:09:40 -07:00
Steven Fackler
2a181320b7 fs: add read_to_string (#1664) 2019-10-16 15:47:37 -07:00
Taiki Endo
4c97e9dc28
fs: remove unnecessary trait and lifetime bounds (#1655) 2019-10-15 19:02:34 +09:00
Jon Gjengset
1cae04f8b3
macros: Use more consistent runtime names (#1628)
As discussed in #1620, the attribute names for `#[tokio::main]` and
`#[tokio::test]` aren't great. Specifically, they both use
`single_thread` and `multi_thread`, as opposed to names that match the
runtime names: `current_thread` and `threadpool`. This PR changes the
former to the latter.

Fixes #1627.
2019-10-12 12:55:39 -04:00
John-John Tedro
29f35df7f8 Remove incorrect FusedFuture impl on Delay (#1652)
`is_terminated` must return `true` until the future has been polled at least once to make sure that the associated block in select is called even after the delay has elapsed.

You use `Delay` in a `select!` by [fusing it](https://docs.rs/futures-preview/0.3.0-alpha.19/futures/future/trait.FutureExt.html#method.fuse):

```rust
let delay = tokio::timer::delay(/* ... */);
let delay = delay.fuse();

select! {
    _ = delay => {
        /* work here */
    }
}
```
2019-10-11 15:45:44 -04:00
Ivan Petkov
741bef8fe1
tokio: move signal and process reexports to crate root (#1643) 2019-10-11 11:00:39 -07:00
Carl Lerche
804dbd6f8e
sync: fix mem leak in oneshot on task migration (#1648)
When polling the task, the current waker is saved to the oneshot state.
When the handle is migrated to a new task and polled again, the waker
must be swaped from the old waker to the new waker. In some cases, there
is a potential for the old waker to leak.

This bug was caught by loom with the recently added memory leak
detection.
2019-10-10 12:00:22 -07:00
Eliza Weisman
69fe65e972 io: add AsyncBufReadExt::split (#1642)
add a `split` method to `AsyncBufReadExt`, analogous to `std::io::BufRead::split`.
2019-10-09 13:17:07 -07:00
Jonathan Bastien-Filiatrault
b8913ec7c0 executor: accurate idle thread tracking for the blocking pool (#1621)
Use a counter to count notifications. This protects against spurious
wakeups by pthreads and other libraries. The state transitions now
track num_idle precisely.
2019-10-07 14:04:28 -07:00
Eliza Weisman
8aa520e2bd io: add missing utility functions (#1632)
The standard library's `io` module has small utilities such as `repeat`,
`empty`, and `sink`, which return `Read` and `Write` implementations.
These can come in handy in some circiumstances. `tokio::io` has no
equivalents that implement `AsyncRead`/`AsyncWrite`.

This commit adds `repeat`, `empty`, and `sink` helpers to `tokio::io`.
2019-10-07 14:02:04 -07:00
Nick Stott
ab2f71a612 chore: fix a comment typo (#1633) 2019-10-07 09:20:57 -07:00
Taiki Endo
42a5cb1508 timer: test arm on targets with target_has_atomic less than 64 (#1634) 2019-10-07 09:19:44 -07:00
Taiki Endo
2b4b0619d7 chore: update Cirrus CI config to test on beta (#1636) 2019-10-07 09:18:38 -07:00
Taiki Endo
55caddb9ce chore: do not trigger CI on std-future branch (#1635) 2019-10-07 09:17:27 -07:00
Vojtech Kral
aefaef3abf tcp: export Incoming type (#1602) 2019-10-02 11:12:05 -07:00
Jon Gjengset
c78c9168d7 macros: allow selecting runtime in tokio::test attr (#1620)
In the past, it was not possible to choose to use the multi-threaded
tokio `Runtime` in tests, which meant that any test that transitively
used `executor::threadpool::blocking` would fail with

```
'blocking' annotation used from outside the context of a thread pool
```

This patch adds a runtime annotation attribute to `#[tokio::test]` just
like `#[tokio::main]` has, which lets users opt in to the threadpool
runtime over `current_thread` (the default).
2019-10-02 10:58:34 -07:00
Jonathan Bastien-Filiatrault
9e1eef829a chore: annotate prelude re-exports as doc(no_inline) (#1601)
Fixes #1593 by making "use as _" linked in the documentation.
2019-10-02 10:55:35 -07:00
Taiki Endo
f48980ae52 chore: update rust-toolchain to use beta (#1619) 2019-10-01 10:13:38 -04:00
Douman
a1d1eb5eb3 macros: Allow arguments in non-main functions 2019-10-01 13:15:46 +02:00
Jon Gjengset
5efe31f2ed Prepare for release of 0.2.0-alpha.6 (#1617)
Note that `tokio-timer` and `tokio-tls` become 0.3.0-alpha.6 (not 0.2.0)
tokio-0.2.0-alpha.6
2019-09-30 18:35:52 -04:00
Jon Gjengset
5ce5a0a0e0
Fix for rust-lang/rust#64477 (#1618)
`foo(format!(...)).await` no longer compiles. There's a fix in
rust-lang/rust#64856, but this works around the problem.
2019-09-30 17:17:14 -04:00
Jon Gjengset
5fd5329497 Create BufStream from a BufReader + BufWriter (#1609)
This is handy if developers want to construct the inner buffers with a
particular capacity, and still end up with a `BufStream` at the end.
2019-09-30 14:22:59 -04:00
Taiki Endo
3b8ee2d991 chore: update futures-preview to 0.3.0-alpha.19 (#1610) 2019-09-30 13:32:37 -04:00
Jon Gjengset
7c341f45e0 chore: move CI to beta (#1615) 2019-09-27 09:51:45 -07:00
Jon Gjengset
611b4e11a7
Make Barrier::wait future Send (#1611)
It wasn't before. Now it is. And that is better.
2019-09-26 18:26:24 -04:00
Taiki Endo
159abb375f
chore: update pin-project to 0.4 (#1603) 2019-09-27 04:51:28 +09:00
Carl Lerche
032b39487c sync: add spin_loop_hint to atomic waker (#1608)
The algorithm backing `AtomicWaker` effectively uses a spin lock backed
by notifying & yielding the current task. This adds a `spin_lock_hint`
annotation to cover this case.

While, in practice, the omission of `spin_lock_hint` would not cause
problems, there are platforms that do not handle spin locks very well
and could enter a deadlock in pathological cases.
2019-09-26 15:16:34 -04:00
Hung-I Wang
b71b7b36be fs: update the doc comment of File::sync_data (#1596) 2019-09-25 09:05:50 -07:00
Taiki Endo
c4567f741a io: add get_*/into_inner methods to BufStream (#1598) 2019-09-25 09:17:43 -04:00
Sean McArthur
18cef1901f tokio: add rt-current-thread optional feature
- Adds a minimum `rt-current-thread` optional feature that exports
  `tokio::runtime::current_thread`.
- Adds a `macros` optional feature to enable the `#[tokio::main]` and
  `#[tokio::test]` attributes.
- Adjusts `#[tokio::main]` macro to select a runtime "automatically" if
  a specific strategy isn't specified. Allows using the macro with only
  the rt-current-thread feature.
2019-09-24 12:17:04 -07:00
Taiki Endo
c81447fdcc
io: remove unsafe pin-projections and remove manual Unpin implementations (#1588)
* Removes most pin-projection related unsafe code.

* Removes manual Unpin implementations.
  As references always implement Unpin, there is no need to implement
  Unpin manually.

* Adds tests to check that Unpin requirement does not change accidentally 
  because changing Unpin requirements will be breaking changes.
2019-09-25 01:17:06 +09:00