608 Commits

Author SHA1 Message Date
bjorn3
b93f41cbb3
Constify all x86 rustc_args_required_const intrinsics (#876) 2020-07-19 15:45:51 +01:00
Alex Crichton
770964adac
Update and revamp wasm32 SIMD intrinsics (#874)
Lots of time and lots of things have happened since the simd128 support
was first added to this crate. Things are starting to settle down now so
this commit syncs the Rust intrinsic definitions with the current
specification (https://github.com/WebAssembly/simd). Unfortuantely not
everything can be enabled just yet but everything is in the pipeline for
getting enabled soon.

This commit also applies a major revamp to how intrinsics are tested.
The intention is that the setup should be much more lightweight and/or
easy to work with after this commit.

At a high-level, the changes here are:

* Testing with node.js and `#[wasm_bindgen]` has been removed. Instead
  intrinsics are tested with Wasmtime which has a nearly complete
  implementation of the SIMD spec (and soon fully complete!)

* Testing is switched to `wasm32-wasi` to make idiomatic Rust bits a bit
  easier to work with (e.g. `panic!)`

* Testing of this crate's simd128 feature for wasm is re-enabled. This
  will run on CI and both compile and execute intrinsics. This should
  bring wasm intrinsics to the same level of parity as x86 intrinsics,
  for example.

* New wasm intrinsics have been added:
  * `iNNxMM_loadAxA_{s,u}`
  * `vNNxMM_load_splat`
  * `v8x16_swizzle`
  * `v128_andnot`
  * `iNNxMM_abs`
  * `iNNxMM_narrow_*_{u,s}`
  * `iNNxMM_bitmask` - commented out until LLVM is updated to LLVM 11
  * `iNNxMM_widen_*_{u,s}` - commented out until
    bytecodealliance/wasmtime#1994 lands
  * `iNNxMM_{max,min}_{u,s}`
  * `iNNxMM_avgr_u`

* Some wasm intrinsics have been removed:
  * `i64x2_trunc_*`
  * `f64x2_convert_*`
  * `i8x16_mul`

* The `v8x16.shuffle` instruction is exposed. This is done through a
  `macro` (not `macro_rules!`, but `macro`). This is intended to be
  somewhat experimental and unstable until we decide otherwise. This
  instruction has 16 immediate-mode expressions and is as a result
  unsuited to the existing `constify_*` logic of this crate. I'm hoping
  that we can game out over time what a macro might look like and/or
  look for better solutions. For now, though, what's implemented is the
  first of its kind in this crate (an architecture-specific macro), so
  some extra scrutiny looking at it would be appreciated.

* Lots of `assert_instr` annotations have been fixed for wasm.

* All wasm simd128 tests are uncommented and passing now.

This is still missing tests for new intrinsics and it's also missing
tests for various corner cases. I hope to get to those later as the
upstream spec itself gets closer to stabilization.

In the meantime, however, I went ahead and updated the `hex.rs` example
with a wasm implementation using intrinsics. With it I got some very
impressive speedups using Wasmtime:

    test benches::large_default  ... bench:     213,961 ns/iter (+/- 5,108) = 4900 MB/s
    test benches::large_fallback ... bench:   3,108,434 ns/iter (+/- 75,730) = 337 MB/s
    test benches::small_default  ... bench:          52 ns/iter (+/- 0) = 2250 MB/s
    test benches::small_fallback ... bench:         358 ns/iter (+/- 0) = 326 MB/s

or otherwise using Wasmtime hex encoding using SIMD is 15x faster on 1MB
chunks or 7x faster on small <128byte chunks.

All of these intrinsics are still unstable and will continue to be so
presumably until the simd proposal in wasm itself progresses to a later
stage. Additionaly we'll still want to sync with clang on intrinsic
names (or decide not to) at some point in the future.

* wasm: Unconditionally expose SIMD functions

This commit unconditionally exposes SIMD functions from the `wasm32`
module. This is done in such a way that the standard library does not
need to be recompiled to access SIMD intrinsics and use them. This,
hopefully, is the long-term story for SIMD in WebAssembly in Rust.

It's unlikely that all WebAssembly runtimes will end up implementing
SIMD so the standard library is unlikely to use SIMD any time soon, but
we want to make sure it's easily available to folks! This commit enables
all this by ensuring that SIMD is available to the standard library,
regardless of compilation flags.

This'll come with the same caveats as x86 support, where it doesn't make
sense to call these functions unless you're enabling simd support one
way or another locally. Additionally, as with x86, if you don't call
these functions then the instructions won't show up in your binary.

While I was here I went ahead and expanded the WebAssembly-specific
documentation for the wasm32 module as well, ensuring that the current
state of SIMD/Atomics are documented.
2020-07-18 13:32:52 +01:00
Ivan Tham
7f78306761
Add _mm_loadu_si64 (#870)
Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>
2020-07-16 18:01:46 +01:00
Daniel Smith
5bfcdc0d57
Implement AVX512f floating point comparisons (#869)
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
2020-07-15 20:06:38 +01:00
Shamir Khodzha
78135e1774
added f32 and f64 unaligned stores and loads from avx512f set (#873) 2020-07-11 09:02:07 +01:00
Daniel Smith
02e1736720
Fix or equals integer comparisons (#872) 2020-07-04 05:41:25 +01:00
Daniel Smith
0108cb216a
Make function signatures consistent (#871) 2020-07-04 03:27:06 +01:00
Daniel Smith
5ff50904d8
Add AVX 512f gather, scatter and compare intrinsics (#866)
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
2020-06-16 17:49:21 +01:00
Jethro Beekman
a214956fe5 Fix x86 extract_epi{8,16} functions
* Update Intel intrinsics definitions with the latest version
* Update _mm256_extract_epi{8,16} to match latest definition
* Fix _mm_extract_epi16 sign extension

Fixes #867
2020-06-09 12:29:01 +01:00
Narek Galstyan
6f8baeb427 Clarify documentation about wasm32 target_feature gates 2020-06-04 09:01:01 +02:00
Daniel Smith
9b3358fc66 Add missing spaces 2020-05-31 19:46:40 +01:00
Daniel Smith
05cf0ce56b s/unsigned/signed/ for epi64 2020-05-31 19:46:40 +01:00
Daniel Smith
dde41d5863 Fix comparison comments 2020-05-31 19:46:40 +01:00
Daniel Smith
e0d2a25d24 Add 64 bit AVX512f le and ge comparisons 2020-05-30 21:50:51 +01:00
Mahmut Bulut
f4cdbb3005 Disable bootstrap for stage0 2020-05-29 21:29:04 +01:00
Mahmut Bulut
4541757677 feature detection 2020-05-29 19:05:48 +01:00
Mahmut Bulut
5b8bd0661a Fix cancellation code arithmetic 2020-05-29 19:05:48 +01:00
Mahmut Bulut
17e4b29dfd Implementation for Aarch64 TME intrinsics 2020-05-29 19:05:48 +01:00
Daniel Smith
a50a216567 Add signed variants 2020-05-29 00:07:03 +01:00
Daniel Smith
d94bc946eb Add gt and eq unsigned variants 2020-05-29 00:07:03 +01:00
Daniel Smith
22a73da688 Add mask variant to cmplt 2020-05-29 00:07:03 +01:00
Daniel Smith
b8e492f5a0 finish/fix adding avx512f to x86_64 2020-05-29 00:07:03 +01:00
Daniel Smith
fa03c0cdaf rustfmt 2020-05-29 00:07:03 +01:00
Daniel Smith
c382acd251 Only check for the instruction prefix since MSVC and Clang use different instructions 2020-05-29 00:07:03 +01:00
Daniel Smith
ad2fe20a87 Use correct instruction 2020-05-29 00:07:03 +01:00
Daniel Smith
2d717c3623 Fix stdarch-verify test 2020-05-29 00:07:03 +01:00
Daniel Smith
7ab646ef42 Move 64 bit function based on stdarch-verify 2020-05-29 00:07:03 +01:00
Daniel Smith
48b086a827 Add __mmask8 type 2020-05-29 00:07:03 +01:00
Daniel Smith
e0ffa88fe7 Add one AVX512f comparison and the intrinsics needed to test it 2020-05-29 00:07:03 +01:00
Daniel Smith
7a29fcc1c8 Convert __mmask16 to use an unsigned type 2020-05-28 22:24:46 +01:00
Amanieu d'Antras
079ce26eb7 Fix CI issues caused by updated nightly
Rust bug: https://github.com/rust-lang/rust/issues/72545
2020-05-28 17:20:07 +01:00
Marko Mijalkovic
15154a882d Use fp64 detection instead of OS blacklist 2020-05-07 20:48:47 +01:00
Marko Mijalkovic
66ef866b34 Fix code style 2020-05-07 20:48:47 +01:00
Marko Mijalkovic
aaee0709b3 Fix building libcore for the Sony PSP
Building the MIPS MSA module for non-fp64 targets fails with an LLVM
error. This commit blacklists PSP targets from MSA support in order to
fix building libcore.
2020-05-07 20:48:47 +01:00
Daniel Verkamp
d9a67ea922
Manually preserve rbx across cpuid instruction (#851)
* Manually preserve rbx across cpuid instruction

This fixes an issue observed when using __cpuid and __cpuid_count with
Address Sanitizer enabled: the generated code uses the rbx register to
access ASAN tracking information without reloading it after cpuid,
resulting in a segfault since the rbx register is overwritten by cpuid
(https://crbug.com/1072045).

This seems like a compiler backend bug, and indeed there is a
long-standing LLVM bug report about a very similar issue:
https://bugs.llvm.org/show_bug.cgi?id=17907

To work around this issue, we can manually preserve the rbx register
contents in the inline assembly.  This is the approach taken by LLVM's
own host cpuid detection code (lib/Host/Support.cpp).  The original rbx
value is stashed in rsi, which is then swapped with rbx to restore the
original value as well as keep the output ebx value from the CPUID
instruction to be used as an output of the inline assembly.

The rbx clobber is also removed; this seems ineffective, and it
conflicts with the ebx output of the inline assembly (ebx is a
subregister of rbx): "Note that clobbering named registers that are also
present in output constraints is not legal."
(https://llvm.org/docs/LangRef.html#clobber-constraints)

* Add link to LLVM bug in cpuid workaround comment
2020-04-29 01:50:13 +01:00
Tobias Kortkamp
a69b5ec7ae Unbreak non-x86 build on FreeBSD
error[E0432]: unresolved import `self::arm::check_for`
  --> src/libstd/../stdarch/crates/std_detect/src/detect/os/freebsd/mod.rs:11:17
   |
11 |         pub use self::arm::check_for;
   |                 ^^^^^^^^^^^^^^^^^^^^ no `check_for` in `std_detect::detect::os::arm`

error[E0425]: cannot find value `detect_features` in module `self::os`
   --> src/libstd/../stdarch/crates/std_detect/src/detect/mod.rs:121:37
    |
121 |     cache::test(x as u32, self::os::detect_features)
    |                                     ^^^^^^^^^^^^^^^ not found in `self::os`
    |
help: possible candidate is found in another module, you can import it into scope
    |
20  | use crate::std_detect::detect::os::arm::detect_features;
2020-04-24 12:45:05 +01:00
Amanieu d'Antras
1f32017c84 Rustfmt 2020-04-24 00:36:01 +01:00
Amanieu d'Antras
39fc893f6b Stabilize all remaining x86 features for feature detection 2020-04-24 00:36:01 +01:00
Amanieu d'Antras
04c1a9a9e9
Use llvm_asm! instead of asm! (#846) 2020-04-09 00:05:10 +01:00
Heinz N. Gies
70f3623b52
Implement additional ARM NEON intriniscs (#792) 2020-04-07 20:06:38 +01:00
Linus Färnstrand
d7a1dbd509 Replace all std::<primitive>::MIN/MAX with just <primitive>::MIN/MAX 2020-04-04 09:51:11 -07:00
Linus Färnstrand
f14b746319 Replace all max/min_value() with MAX/MIN 2020-04-04 09:51:11 -07:00
Linus Färnstrand
e0533a30d3 Stop importing int/float modules 2020-04-04 09:51:11 -07:00
Makoto Kato
d5d3117b9b
Support crc32 even if on arm32 (#834)
CRC32 is supported on A32 and T32.
2020-03-30 16:38:23 +01:00
Linus Färnstrand
b852344de5
Replace module MIN/MAX and min/max_value() with assoc consts (#843) 2020-03-29 17:08:21 +01:00
Amanieu d'Antras
c554b42b2a
Fix CI (#845)
* Use ubuntu 18.04 instead of 18.10 for MIPS CI

* Fix WASM CI
2020-03-29 15:15:59 +01:00
Makoto Kato
09ef01ade1
Add crypto target feature detection to arm32 (#833) 2020-03-29 12:28:17 +01:00
Jack O'Connor
e367bcd7f9
re-stabilize the AVX-512 features that were stabilized in Rust 1.27.0 (#842)
* re-stabilize the AVX-512 features that were stabilized in Rust 1.27.0

https://github.com/rust-lang/stdarch/pull/739 added per-feature
stabilization of runtime CPU feature detection. In so doing, it
de-stabilized some detection features that had been stable since Rust
1.27.0, breaking some published crates (on nightly). This commit
re-stabilizes the subset of AVX-512 detection features that were
included in 1.27.0 (that is, the pre-Ice-Lake subset). Other instruction
sets (MMX in particular) remain de-stabilized, pending a decision about
whether should ever stabilize them.

See https://github.com/rust-lang/rust/issues/68905.

* add a comment explaining feature detection stability

* adjust stabilizations to match most recent proposal

https://github.com/rust-lang/rust/issues/68905#issuecomment-595376319
2020-03-19 14:29:50 +00:00
Tyg13
9ab5dc0873
Remove unnecessary parens. (#839) 2020-01-30 13:15:36 +01:00
Aleksey Kladov
0bd16446db Fix race condition in feature cache on 32 platforms (#837)
* Fix race condition in feature cache on 32 platforms

If we observe that the second word is initialized, we can't really
assume that the first is initialized as well. So check each word
separately.

* Use stronger atomic ordering

Better SeqCst than sorry!

* Use two caches on x64 for simplicity
2020-01-28 21:53:17 +01:00