Various architectures have support for 64-bit integers, but there are
Rust targets for those architectures where the pointer size is
intentionally just 32-bit. For SWAR this smaller pointer size would
negatively affect those targets, so this PR ensures the chunk size stays
64-bit on those targets.
Closes#877.
This is a good time to make ByteBuf parsing more consistent as I'm
rewriting it anyway. This commit integrates the changes from #877 and
also handles a leading surrogate followed by a surrogate pair correctly.
This does not affect performance significantly.
Co-authored-by: Luca Casonato <hello@lcas.dev>
When ignoring *War and Peace* (in Russian), this increases performance
from 640 MB/s to 1080 MB/s (+70%).
When parsing into String, the savings are moderate but still
significant: 275 MB/s to 320 MB/s (+15%).
warning: this expression creates a reference which is immediately dereferenced by the compiler
--> tests/test.rs:2515:9
|
2515 | &"\"\t\n\r\"",
| ^^^^^^^^^^^^^ help: change this to: `"\"\t\n\r\""`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
= note: `-W clippy::needless-borrow` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::needless_borrow)]`
This is not backed by benchmarks, but it seems reasonable that we'd be
more starved for cache than CPU in IO-bound tasks. It also simplifies
code a bit and frees up some memory, which is probably a good thing.
Translating index into a line/column pair takes considerable time.
Notably, the JSON benchmark modified to run on malformed data spends
around 50% of the CPU time generating the error object.
While it is generally assumed that the cold path is quite slow, such a
drastic pessimization may be unexpected, especially when a faster
implementation exists.
Using vectorized routines provided by the memchr crate increases
performance of the failure path by 2x on average.
Old implementation:
DOM STRUCT
data/canada.json 122 MB/s 168 MB/s
data/citm_catalog.json 135 MB/s 195 MB/s
data/twitter.json 142 MB/s 226 MB/s
New implementation:
DOM STRUCT
data/canada.json 216 MB/s 376 MB/s
data/citm_catalog.json 238 MB/s 736 MB/s
data/twitter.json 210 MB/s 492 MB/s
In comparison, the performance of the happy path is:
DOM STRUCT
data/canada.json 283 MB/s 416 MB/s
data/citm_catalog.json 429 MB/s 864 MB/s
data/twitter.json 275 MB/s 541 MB/s
While this introduces a new dependency, memchr is much faster to compile
than serde, so compile time does not increase significantly.
Additionally, memchr provides a more efficient SWAR-based implementation
of both the memchr and count routines even without std, providing
benefits for embedded uses as well.
warning: can be more succinctly written as a byte str
--> tests/test.rs:1108:13
|
1108 | &[b'"', b'\n', b'"'],
| ^^^^^^^^^^^^^^^^^^^^ help: try: `b"\"\n\""`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#byte_char_slices
= note: `-W clippy::byte-char-slices` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::byte_char_slices)]`
warning: can be more succinctly written as a byte str
--> tests/test.rs:1112:13
|
1112 | &[b'"', b'\x1F', b'"'],
| ^^^^^^^^^^^^^^^^^^^^^^ help: try: `b"\"\x1F\""`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#byte_char_slices