x86_64 can load unaligned words in a single cache line as fast as
aligned words. Even when crossing cache or page boundaries it is just as
fast to do an unaligned word read instead of multiple byte reads.
Also add a couple more tests & benchmarks.
This adds the truncdfsf2 intrinsic and a corresponding fuzz test case. The
implementation of trunc is generic to make it easy to add truncdfhs2 and
truncsfhf2 if rust ever gets `f16` support.
* mem: Move mem* functions to separate directory
Signed-off-by: Joe Richey <joerichey@google.com>
* memcpy: Create separate memcpy.rs file
Signed-off-by: Joe Richey <joerichey@google.com>
* benches: Add benchmarks for mem* functions
This allows comparing the "normal" implementations to the
implementations provided by this crate.
Signed-off-by: Joe Richey <joerichey@google.com>
* mem: Add REP MOVSB/STOSB implementations
The assembly generated seems correct:
https://rust.godbolt.org/z/GGnec8
Signed-off-by: Joe Richey <joerichey@google.com>
* mem: Add documentations for REP string insturctions
Signed-off-by: Joe Richey <joerichey@google.com>
* Use quad-word rep string instructions
Signed-off-by: Joe Richey <joerichey@google.com>
* Prevent panic when compiled in debug mode
Signed-off-by: Joe Richey <joerichey@google.com>
* Add tests for mem* functions
Signed-off-by: Joe Richey <joerichey@google.com>
* Add build/test with the "asm" feature
Signed-off-by: Joe Richey <joerichey@google.com>
* Add byte length to Bencher
Signed-off-by: Joe Richey <joerichey@google.com>
* Allow FFI-unsafe warnings for u128/i128
Handle new warnings on nightly, and we shouldn't need to worry about
these with compiler-builtins since this is tied to a particular compiler.
* Clean up crate attributes
* No need for stability marker
* Rustdoc docs not used for this crate
* Remove old build-system related cruft from rustc itself.
* Run `cargo fmt`
All tests are moved to a separate crate in this repository to enable features by
default. Additionally the test generation is moved to a seprate build script and
simplified to reduce the amount of boilerplate needed per test.
Overall this should still be testing everything, just in a different location!