Files
tracing/tracing-macros
Eliza Weisman c0c61d1cf4 core, tracing: make disabled events way faster (#326)
## Motivation

When `tracing` is not in use, no default subscriber will be set, and
tracing instrumentation points can be skipped. Currently, the
instrumentation macros detect this by testing if the callsite's
`Interest` is `never`. However, this is significantly slower than how
the `log` crate detects that there is no logger: we must call a function
through a trait object, do a relaxed load, construct an `Interest` from
an integer, and then test on the value of the `Interest`. Furthermore,
we must construct the callsite _at all_ in order to do this. Meanwhile, 
when no logger is set, `log` can skip log records with only a single 
relaxed atomic load.

## Solution

This branch adds a new private-API method in `tracing_core::dispatcher`
that allows us to check if a dispatcher has ever been set. If this is
false, then `tracing` has not yet been initialized and the callsite can
be elided entirely. Representing this as an atomic bool lets us perform
a single relaxed load and branch, without having to mess around with
`Interest`s and callsite vtables and so on. In cases where `tracing`
will never be used, this will always be false.

This the event macro performance with no subscriber set on par with the
`log` crate's performance when no logger is set. Spans are a bit slower
because they must still construct a disabled span object, but the span
macros are somewhat faster in this case than they were previously.

## Benchmark results

Before:

![Screen Shot 2019-09-04 at 2 59 06 PM](https://user-images.githubusercontent.com/2796466/64294816-965d9b00-cf24-11e9-9e90-b6d3ee337da8.png)

After:

![Screen Shot 2019-09-04 at 2 58 34 PM](https://user-images.githubusercontent.com/2796466/64294827-9d84a900-cf24-11e9-9223-fa39dfcef46b.png)

Note that when no subscriber is set, skipping an event previously
took an average of around 2 ns, and skipping a span took around 
5 ns. Now, skipping an event takes around 850 ps (about the
same as skipping a `log` macro when there's no logger), while 
skipping a span macro takes around 2 ns.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* tracing, core: improve no-subscriber performance significantly

This branch adds a new private API to `tracing-core` for checking if a
subscriber has ever been set. If a subscriber has never been set, we can
elide almost all of the event macro, and more of the span macro than we
were previously able to. Unlike checking a cached interest, which must
synchronize with the callsite registry, checking if a subscriber has
been set may be a relaxed rather than acquire load.

The performance of the `event!` macro (and the related `trace!`,
`info!`, etc macros) when no subscriber has ever been set (i.e., tracing
is not being used) is now comparable with that of the `log` crate, at
approximately 600 picoseconds (on my MacBook). As per `criterion`:

```
no_subscriber/event     time:   [610.43 ps 621.06 ps 633.20 ps]
                        change: [-64.247% -62.295% -60.263%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

  no_subscriber/log       time:   [689.22 ps 707.34 ps 728.25 ps]
                        change: [+27.149% +33.128% +40.141%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

no_subscriber_field/event
                        time:   [623.28 ps 635.73 ps 648.88 ps]
                        change: [-59.666% -58.074% -56.649%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
no_subscriber_field/log time:   [621.73 ps 632.07 ps 643.30 ps]
                        change: [+1.6194% +4.3288% +7.1454%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

```

The worse performance of the `log` crate without fields than with fields
is probably statistical noise that criterion failed to eliminate.

The span macros cannot achieve this level of performance currently, as
the callsite must still be constructed in order to emit `log` events
when that feature flag is enabled. Future work will involve tuning the
`span!` macro some more. However, performance is improved by this change.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* optimize disabled spans when log support is off
* tracing: add benchmarks for getting the subscriber

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2019-09-04 18:38:13 -07:00
..