These seem to have been introduced by recent LLVM changes.
* The instruction limit for vld*/vst* has been raised. This is not a
significant issue, it is only used for testing.
* vld*/vst* instructions are generated with overly strict alignments:
https://github.com/rust-lang/stdarch/issues/1217
* vtbl/vtbx instrinsics are failing intrinsic-test for unknown reasons.
This avoids a flood of warnings when testing the
armv7-unknown-linux-gnueabihf target.
Under this target, we would pass -Ctarget-features=+neon when building
intrinsic-test, but it is compiled for the host (and this tool doesn't
need Neon even if the host _is_ Armv7).
This also sets --target when running the 'hex' example, since that
seems more appropriate than always building it for the host.
This involves moving from the ACLE intrinsic definitions (which aren't
available for SVE at this point) to a JSON file. This was derived from
ARM's documentation[^1], and then relicensed under `MIT OR Apache-2.0` for
use in this repository.
[^1]: https://developer.arm.com/architectures/instruction-sets/intrinsics
* Sync with the latest LLVM which has a few new intrinsic names
* Move explicit tests back to `assert_instr` since `assert_instr` now
supports specifying const-generic arguments inline.
* Enable tests where wasmtime implements the instruction as well as LLVM.
* Ensure there are tests for all functions that can be tested at this
time (those that aren't unimplemented in wasmtime).
There's still a number of `assert_instr` tests that are commented out.
These are either because they're unimplemented in wasmtime at the moment
or LLVM doesn't have an implementation for the instruction yet.
Lots of time and lots of things have happened since the simd128 support
was first added to this crate. Things are starting to settle down now so
this commit syncs the Rust intrinsic definitions with the current
specification (https://github.com/WebAssembly/simd). Unfortuantely not
everything can be enabled just yet but everything is in the pipeline for
getting enabled soon.
This commit also applies a major revamp to how intrinsics are tested.
The intention is that the setup should be much more lightweight and/or
easy to work with after this commit.
At a high-level, the changes here are:
* Testing with node.js and `#[wasm_bindgen]` has been removed. Instead
intrinsics are tested with Wasmtime which has a nearly complete
implementation of the SIMD spec (and soon fully complete!)
* Testing is switched to `wasm32-wasi` to make idiomatic Rust bits a bit
easier to work with (e.g. `panic!)`
* Testing of this crate's simd128 feature for wasm is re-enabled. This
will run on CI and both compile and execute intrinsics. This should
bring wasm intrinsics to the same level of parity as x86 intrinsics,
for example.
* New wasm intrinsics have been added:
* `iNNxMM_loadAxA_{s,u}`
* `vNNxMM_load_splat`
* `v8x16_swizzle`
* `v128_andnot`
* `iNNxMM_abs`
* `iNNxMM_narrow_*_{u,s}`
* `iNNxMM_bitmask` - commented out until LLVM is updated to LLVM 11
* `iNNxMM_widen_*_{u,s}` - commented out until
bytecodealliance/wasmtime#1994 lands
* `iNNxMM_{max,min}_{u,s}`
* `iNNxMM_avgr_u`
* Some wasm intrinsics have been removed:
* `i64x2_trunc_*`
* `f64x2_convert_*`
* `i8x16_mul`
* The `v8x16.shuffle` instruction is exposed. This is done through a
`macro` (not `macro_rules!`, but `macro`). This is intended to be
somewhat experimental and unstable until we decide otherwise. This
instruction has 16 immediate-mode expressions and is as a result
unsuited to the existing `constify_*` logic of this crate. I'm hoping
that we can game out over time what a macro might look like and/or
look for better solutions. For now, though, what's implemented is the
first of its kind in this crate (an architecture-specific macro), so
some extra scrutiny looking at it would be appreciated.
* Lots of `assert_instr` annotations have been fixed for wasm.
* All wasm simd128 tests are uncommented and passing now.
This is still missing tests for new intrinsics and it's also missing
tests for various corner cases. I hope to get to those later as the
upstream spec itself gets closer to stabilization.
In the meantime, however, I went ahead and updated the `hex.rs` example
with a wasm implementation using intrinsics. With it I got some very
impressive speedups using Wasmtime:
test benches::large_default ... bench: 213,961 ns/iter (+/- 5,108) = 4900 MB/s
test benches::large_fallback ... bench: 3,108,434 ns/iter (+/- 75,730) = 337 MB/s
test benches::small_default ... bench: 52 ns/iter (+/- 0) = 2250 MB/s
test benches::small_fallback ... bench: 358 ns/iter (+/- 0) = 326 MB/s
or otherwise using Wasmtime hex encoding using SIMD is 15x faster on 1MB
chunks or 7x faster on small <128byte chunks.
All of these intrinsics are still unstable and will continue to be so
presumably until the simd proposal in wasm itself progresses to a later
stage. Additionaly we'll still want to sync with clang on intrinsic
names (or decide not to) at some point in the future.
* wasm: Unconditionally expose SIMD functions
This commit unconditionally exposes SIMD functions from the `wasm32`
module. This is done in such a way that the standard library does not
need to be recompiled to access SIMD intrinsics and use them. This,
hopefully, is the long-term story for SIMD in WebAssembly in Rust.
It's unlikely that all WebAssembly runtimes will end up implementing
SIMD so the standard library is unlikely to use SIMD any time soon, but
we want to make sure it's easily available to folks! This commit enables
all this by ensuring that SIMD is available to the standard library,
regardless of compilation flags.
This'll come with the same caveats as x86 support, where it doesn't make
sense to call these functions unless you're enabling simd support one
way or another locally. Additionally, as with x86, if you don't call
these functions then the instructions won't show up in your binary.
While I was here I went ahead and expanded the WebAssembly-specific
documentation for the wasm32 module as well, ensuring that the current
state of SIMD/Atomics are documented.
* Attempt to fix tests on master
* Make all doctests use items from the real `std` rather than this
crate, it's just easier
* Handle debuginfo weirdness by flagging functions as `no_mangle` that
we're looking for instructions within.
* Handle double undescores in symbol names
This commit:
* renames `coresimd` to `core_arch` and `stdsimd` to `std_detect`
* `std_detect` does no longer depend on `core_arch` - it is a freestanding
`no_std` library that only depends on `core` - it is renamed to `std_detect`
* moves the top-level coresimd and stdsimd directories into the appropriate
crates/... directories - this simplifies creating crate.io releases of these crates
* moves the top-level `coresimd` and `stdsimd` sub-directories into their
corresponding crates in `crates/{core_arch, std_detect}`.
We historically have run single-threaded verbose tests because we were
faulting all over the place due to bugs in rustc itself, primarily
around calling conventions and passing values around. Those bugs have
all since been fixed so we should be clear to run multithreaded tests
quietly on CI nowadays!
Closes#621
* Update representation of `v128`
* Rename everything with new naming convention of underscores and no
modules/impls
* Remove no longer necessary `wasm_simd128` feature
* Remove `#[target_feature]` attributes (use `#[cfg]` instead)
* Update `assert_instr` tests
* Update some implementations as LLVM has evolved
* Allow some more esoteric syntax in `#[assert_instr]`
* Adjust the safety of APIs where appropriate
* Remove macros in favor of hand-coded implementations
* Comment out the tests for now as there's no known runtime for these
yet
* fix _mm_castsi128_pd and _mm_castpd_si128 impls
The _mm_castX_Y SSE intrinsics are "reinterpreting" casts; LLVM's
simd_cast is a "converting" cast. Replace simd_cast with mem::transmute.
Fixes#55249
* Temporarily pin CI
* Fix i686 segfaults
* Fix wasm CI
Output of `wasm2wat` has changed!
* Fix AppVeyor with an older nightly