itsscb/rust - rust - Gitea: Git with a cup of tea

mirror of https://github.com/rust-lang/rust.git synced 2025-10-03 02:40:40 +00:00

Author	SHA1	Message	Date
Amanieu d'Antras	4fe088329c	Work around CI failures for the ARM target These seem to have been introduced by recent LLVM changes. * The instruction limit for vld/vst has been raised. This is not a significant issue, it is only used for testing. * vld/vst instructions are generated with overly strict alignments: https://github.com/rust-lang/stdarch/issues/1217 * vtbl/vtbx instrinsics are failing intrinsic-test for unknown reasons.	2023-11-30 07:48:09 +00:00
Jacob Bramley	3324de54c2	Don't pass target-features to host tests. This avoids a flood of warnings when testing the armv7-unknown-linux-gnueabihf target. Under this target, we would pass -Ctarget-features=+neon when building intrinsic-test, but it is compiled for the host (and this tool doesn't need Neon even if the host _is_ Armv7). This also sets --target when running the 'hex' example, since that seems more appropriate than always building it for the host.	2023-11-01 14:33:48 +01:00
Eduardo Sánchez Muñoz	69ff2e3a37	Explicitly disable SSE3 for x86_64	2023-10-01 17:57:00 +01:00
Gijs Burghoorn	8a23f93e8b	Fix: #1464 for rv64 zk	2023-09-22 10:08:56 +08:00
Gijs Burghoorn	f4ee8f0282	Fix: Testing for RISC-V Zb intrinsics	2023-09-01 18:32:40 +02:00
Gijs Burghoorn	d1229d008b	Fix: Add proper flags for RISCV64 ci	2023-08-31 23:12:32 +02:00
Amanieu d'Antras	55ef711226	Disable vld2q_dup_f32 test in CI This is broken due to rust-lang/rust#112460.	2023-06-20 18:20:19 +02:00
Adam Gemmell	0125fa17c8	Remove ACLE submodule This involves moving from the ACLE intrinsic definitions (which aren't available for SVE at this point) to a JSON file. This was derived from ARM's documentation[^1], and then relicensed under `MIT OR Apache-2.0` for use in this repository. [^1]: https://developer.arm.com/architectures/instruction-sets/intrinsics	2023-05-15 17:34:11 +02:00
Luca Barbato	fa4e478dbe	Skip vec_expte tests since they trip qemu	2023-04-24 19:02:22 -07:00
Amanieu d'Antras	39849dd6c6	Import the asm! macro from core::arch (#1265 )	2021-12-09 23:50:37 +00:00
Amanieu d'Antras	937978eeef	Update the intrinsic checker tool (#1258 )	2021-12-04 13:03:30 +00:00
Jamie Cunliffe	b04e740f24	Handle intrinsics with constraints in the test tool. (#1237 )	2021-11-05 01:47:31 +00:00
Hans Kratz	26cce19427	Make dedup guard optional (#1215 )	2021-09-20 17:19:05 +01:00
Jamie Cunliffe	bd0e352338	Intrinsic test tool to compare neon intrinsics with C (#1170 )	2021-09-09 19:16:45 +01:00
Alex Crichton	8ed0d3cbd5	More wasm SIMD updates * Sync with the latest LLVM which has a few new intrinsic names * Move explicit tests back to `assert_instr` since `assert_instr` now supports specifying const-generic arguments inline. * Enable tests where wasmtime implements the instruction as well as LLVM. * Ensure there are tests for all functions that can be tested at this time (those that aren't unimplemented in wasmtime). There's still a number of `assert_instr` tests that are commented out. These are either because they're unimplemented in wasmtime at the moment or LLVM doesn't have an implementation for the instruction yet.	2021-03-21 09:24:39 +00:00
Alex Crichton	e35da555f8	Update WebAssembly SIMD/Atomics (#1073 )	2021-03-11 23:30:30 +00:00
kangshan1157	936e1add97	Implement avx512bf16 intrinsics (#998 )	2021-02-10 23:29:27 +00:00
Makoto Kato	e020a85ff0	Run CI for i686-pc-windows-msvc (#934 )	2020-10-25 01:32:27 +01:00
Alex Crichton	770964adac	Update and revamp wasm32 SIMD intrinsics (#874 ) Lots of time and lots of things have happened since the simd128 support was first added to this crate. Things are starting to settle down now so this commit syncs the Rust intrinsic definitions with the current specification (https://github.com/WebAssembly/simd). Unfortuantely not everything can be enabled just yet but everything is in the pipeline for getting enabled soon. This commit also applies a major revamp to how intrinsics are tested. The intention is that the setup should be much more lightweight and/or easy to work with after this commit. At a high-level, the changes here are: * Testing with node.js and `#[wasm_bindgen]` has been removed. Instead intrinsics are tested with Wasmtime which has a nearly complete implementation of the SIMD spec (and soon fully complete!) * Testing is switched to `wasm32-wasi` to make idiomatic Rust bits a bit easier to work with (e.g. `panic!)` * Testing of this crate's simd128 feature for wasm is re-enabled. This will run on CI and both compile and execute intrinsics. This should bring wasm intrinsics to the same level of parity as x86 intrinsics, for example. * New wasm intrinsics have been added: * `iNNxMM_loadAxA_{s,u}` * `vNNxMM_load_splat` * `v8x16_swizzle` * `v128_andnot` * `iNNxMM_abs` * `iNNxMM_narrow__{u,s}` `iNNxMM_bitmask` - commented out until LLVM is updated to LLVM 11 * `iNNxMM_widen__{u,s}` - commented out until bytecodealliance/wasmtime#1994 lands `iNNxMM_{max,min}_{u,s}` * `iNNxMM_avgr_u` * Some wasm intrinsics have been removed: * `i64x2_trunc_` `f64x2_convert_` `i8x16_mul` * The `v8x16.shuffle` instruction is exposed. This is done through a `macro` (not `macro_rules!`, but `macro`). This is intended to be somewhat experimental and unstable until we decide otherwise. This instruction has 16 immediate-mode expressions and is as a result unsuited to the existing `constify_` logic of this crate. I'm hoping that we can game out over time what a macro might look like and/or look for better solutions. For now, though, what's implemented is the first of its kind in this crate (an architecture-specific macro), so some extra scrutiny looking at it would be appreciated. Lots of `assert_instr` annotations have been fixed for wasm. * All wasm simd128 tests are uncommented and passing now. This is still missing tests for new intrinsics and it's also missing tests for various corner cases. I hope to get to those later as the upstream spec itself gets closer to stabilization. In the meantime, however, I went ahead and updated the `hex.rs` example with a wasm implementation using intrinsics. With it I got some very impressive speedups using Wasmtime: test benches::large_default ... bench: 213,961 ns/iter (+/- 5,108) = 4900 MB/s test benches::large_fallback ... bench: 3,108,434 ns/iter (+/- 75,730) = 337 MB/s test benches::small_default ... bench: 52 ns/iter (+/- 0) = 2250 MB/s test benches::small_fallback ... bench: 358 ns/iter (+/- 0) = 326 MB/s or otherwise using Wasmtime hex encoding using SIMD is 15x faster on 1MB chunks or 7x faster on small <128byte chunks. All of these intrinsics are still unstable and will continue to be so presumably until the simd proposal in wasm itself progresses to a later stage. Additionaly we'll still want to sync with clang on intrinsic names (or decide not to) at some point in the future. * wasm: Unconditionally expose SIMD functions This commit unconditionally exposes SIMD functions from the `wasm32` module. This is done in such a way that the standard library does not need to be recompiled to access SIMD intrinsics and use them. This, hopefully, is the long-term story for SIMD in WebAssembly in Rust. It's unlikely that all WebAssembly runtimes will end up implementing SIMD so the standard library is unlikely to use SIMD any time soon, but we want to make sure it's easily available to folks! This commit enables all this by ensuring that SIMD is available to the standard library, regardless of compilation flags. This'll come with the same caveats as x86 support, where it doesn't make sense to call these functions unless you're enabling simd support one way or another locally. Additionally, as with x86, if you don't call these functions then the instructions won't show up in your binary. While I was here I went ahead and expanded the WebAssembly-specific documentation for the wasm32 module as well, ensuring that the current state of SIMD/Atomics are documented.	2020-07-18 13:32:52 +01:00
gnzlbg	ec7697de1b	Disable mips MSA builds - I dont think they can ever work except for the r6 targets	2019-07-14 15:29:19 +02:00
gnzlbg	dffdd66d81	Disable wasm32 simd128 tests	2019-07-14 15:29:19 +02:00
gnzlbg	1253c1daed	Enable warnings globally	2019-07-09 01:37:07 +02:00
gnzlbg	686b813f5d	Update repo name	2019-07-09 01:37:07 +02:00
gnzlbg	127f13f10f	Fix assert_instr tests	2019-07-08 22:58:19 +02:00
Alex Crichton	7215eb4613	Hook tests up to node.js We can even test some of the functions!	2019-04-25 17:19:51 +02:00
gnzlbg	7d9e92335b	Only test on 64-bit ppc	2019-04-17 14:21:15 +02:00
gnzlbg	28e2f594b8	Run build jobs with target-features on mips and ppc	2019-04-17 14:21:15 +02:00
Radovan Birdic	fd4cf83d42	Added msa jobs for mips-gnu targets	2019-04-09 09:43:17 +02:00
gnzlbg	c91584d241	Make core_arch compatible with Rust2015 and Rust2018	2019-02-23 01:14:07 +01:00
gnzlbg	a177055824	Test Rust2018 builds	2019-02-23 01:14:07 +01:00
gnzlbg	e56de7344f	Fix wasm32 build job	2019-02-14 03:45:57 +01:00
gnzlbg	6affc41386	Use builtin nvptx64-nvidia-cuda target	2019-02-13 22:00:20 +01:00
gnzlbg	ff129bff05	Add cargo features to disable usage of file I/O and dlsym in std_detect	2019-02-09 11:47:38 +01:00
Alex Crichton	cf738b0d36	Attempt to fix tests on master (#662 ) * Attempt to fix tests on master * Make all doctests use items from the real `std` rather than this crate, it's just easier * Handle debuginfo weirdness by flagging functions as `no_mangle` that we're looking for instructions within. * Handle double undescores in symbol names	2019-01-30 15:11:35 -08:00
gnzlbg	8bfa74b5e7	Enable passing allow_failure builds (#644 )	2019-01-22 08:57:17 -08:00
gnzlbg	11c624e488	Refactor stdsimd This commit: * renames `coresimd` to `core_arch` and `stdsimd` to `std_detect` * `std_detect` does no longer depend on `core_arch` - it is a freestanding `no_std` library that only depends on `core` - it is renamed to `std_detect` * moves the top-level coresimd and stdsimd directories into the appropriate crates/... directories - this simplifies creating crate.io releases of these crates * moves the top-level `coresimd` and `stdsimd` sub-directories into their corresponding crates in `crates/{core_arch, std_detect}`.	2019-01-22 17:04:25 +01:00
Peter Jin	d30c29e926	Add a build libcore-only nvptx64 test (using xargo). This also disables the "integer_atomics" feature on nvptx/nvptx64.	2018-12-29 12:02:16 +01:00
Alex Crichton	24b3977f6a	Run multithreaded quiet tests (#622 ) We historically have run single-threaded verbose tests because we were faulting all over the place due to bugs in rustc itself, primarily around calling conventions and passing values around. Those bugs have all since been fixed so we should be clear to run multithreaded tests quietly on CI nowadays! Closes #621	2018-12-14 13:28:23 -06:00
Alex Crichton	cb921381c4	Rewrite simd128 and wasm support (#620 ) * Update representation of `v128` * Rename everything with new naming convention of underscores and no modules/impls * Remove no longer necessary `wasm_simd128` feature * Remove `#[target_feature]` attributes (use `#[cfg]` instead) * Update `assert_instr` tests * Update some implementations as LLVM has evolved * Allow some more esoteric syntax in `#[assert_instr]` * Adjust the safety of APIs where appropriate * Remove macros in favor of hand-coded implementations * Comment out the tests for now as there's no known runtime for these yet	2018-12-13 20:17:30 -06:00
gnzlbg	b1782e71ef	travis linux VM do not all support avx2	2018-11-11 12:37:44 +01:00
gnzlbg	eee3d5e6f0	fix clippy and shellcheck issues	2018-11-11 12:37:44 +01:00
gnzlbg	51d9585ece	cleanup travis and run.sh scripts	2018-11-11 12:37:44 +01:00
Kaz Wesley	7fda54f9bc	fix _mm_castsi128_pd and _mm_castpd_si128 impls (#581 ) * fix _mm_castsi128_pd and _mm_castpd_si128 impls The _mm_castX_Y SSE intrinsics are "reinterpreting" casts; LLVM's simd_cast is a "converting" cast. Replace simd_cast with mem::transmute. Fixes #55249 * Temporarily pin CI * Fix i686 segfaults * Fix wasm CI Output of `wasm2wat` has changed! * Fix AppVeyor with an older nightly	2018-10-23 18:10:54 +02:00
gnzlbg	3daebfbc0b	Add wasm32 simd128 intrinsics (#549 ) * Add wasm32 simd128 intrinsics * test wasm32 simd128 instructions * Run wasm tests like all other tests * use modules instead of types to access wasm simd128 interpretations * generate docs for wasm32-unknown-unknown * fix typo * Enable #[assert_instr] on wasm32 * Shell out to Node's `execSync` to execute `wasm2wat` over our wasm file * Parse the wasm file line-by-line, looking for various function markers and such * Use the `elem` section to build a function pointer table, allowing us to map exactly from function pointer to a function * Avoid losing debug info (the names section) in release mode by stripping `--strip-debug` from `rust-lld`. * remove exclude list from Cargo.toml * fix assert_instr for non-wasm targets * re-format assert-instr changes * add crate that uses assert_instr * Fix instructions having extra quotes * Add assert_instr for wasm memory intrinsics * Remove hacks for git wasm-bindgen * add wasm_simd128 feature * make wasm32 build correctly * run simd128 tests on ci * remove wasm-assert-instr-tests	2018-08-15 09:20:33 -07:00
Alex Crichton	f1e4ebd8de	Fix compile of stdsimd on powerpc with no flags (#531 ) We're running into issues updating with rust-lang/rust#52535, so we need to get this working without `RUSTFLAGS` enabling the `altivec` feature	2018-07-20 11:54:33 -05:00
gnzlbg	e70ae5558f	add CI for Android	2018-06-23 16:09:27 +02:00
Luca Barbato	3d618b3cd6	Do not run the altivec tests for powerpc64 The big endian variant will be supported properly later.	2018-05-23 18:16:14 +02:00
Luca Barbato	9888c6ce82	Update proc macro2 (#455 ) * Update to proc_macro2 0.4 and related * Update to proc_macro2 0.4 and related * Update to proc_macro2 0.4 and related * Add proc_macro_gen feature * Update to the new rustfmt cli * A few proc-macro2 stylistic updates * Disable RUST_BACKTRACE by default * Allow rustfmt failure for now * Disable proc-macro2 nightly feature in verify-x86 Currently this causes bugs on nightly due to upstream rustc bugs, this should be temporary * Attempt to thwart mergefunc * Use static relocation model on i686	2018-05-21 13:37:41 -05:00
gnzlbg	8ea9bc53f1	Initial PowerPC altivec and VSX support (#447 ) * add some powerpc/powerpc64 altivec/vsx intrinsics * temporarily make IntoBits/FromBits inline(always) * include powerpc64 module; use inline(always) from/into_bits only on powerpc	2018-05-16 12:10:19 -05:00
gnzlbg	c0bf5d9c42	Workarounds for all/any mask reductions on x86, armv7, and aarch64 (#425 ) * Work arounds for LLVM6 code-gen bugs in all/any reductions This commit adds workarounds for the mask reductions: `all` and `any`. 64-bit wide mask types (`m8x8`, `m16x4`, `m32x2`) `x86_64` with `MMX` enabled ```asm all_8x8: push rbp mov rbp, rsp movzx eax, byte, ptr, [rdi, +, 7] movd xmm0, eax movzx eax, byte, ptr, [rdi, +, 6] movd xmm1, eax punpcklwd xmm1, xmm0 movzx eax, byte, ptr, [rdi, +, 5] movd xmm0, eax movzx eax, byte, ptr, [rdi, +, 4] movd xmm2, eax punpcklwd xmm2, xmm0 punpckldq xmm2, xmm1 movzx eax, byte, ptr, [rdi, +, 3] movd xmm0, eax movzx eax, byte, ptr, [rdi, +, 2] movd xmm1, eax punpcklwd xmm1, xmm0 movzx eax, byte, ptr, [rdi, +, 1] movd xmm0, eax movzx eax, byte, ptr, [rdi] movd xmm3, eax punpcklwd xmm3, xmm0 punpckldq xmm3, xmm1 punpcklqdq xmm3, xmm2 movdqa xmm0, xmmword, ptr, [rip, +, LCPI9_0] pand xmm3, xmm0 pcmpeqw xmm3, xmm0 pshufd xmm0, xmm3, 78 pand xmm0, xmm3 pshufd xmm1, xmm0, 229 pand xmm1, xmm0 movdqa xmm0, xmm1 psrld xmm0, 16 pand xmm0, xmm1 movd eax, xmm0 and al, 1 pop rbp ret any_8x8: push rbp mov rbp, rsp movzx eax, byte, ptr, [rdi, +, 7] movd xmm0, eax movzx eax, byte, ptr, [rdi, +, 6] movd xmm1, eax punpcklwd xmm1, xmm0 movzx eax, byte, ptr, [rdi, +, 5] movd xmm0, eax movzx eax, byte, ptr, [rdi, +, 4] movd xmm2, eax punpcklwd xmm2, xmm0 punpckldq xmm2, xmm1 movzx eax, byte, ptr, [rdi, +, 3] movd xmm0, eax movzx eax, byte, ptr, [rdi, +, 2] movd xmm1, eax punpcklwd xmm1, xmm0 movzx eax, byte, ptr, [rdi, +, 1] movd xmm0, eax movzx eax, byte, ptr, [rdi] movd xmm3, eax punpcklwd xmm3, xmm0 punpckldq xmm3, xmm1 punpcklqdq xmm3, xmm2 movdqa xmm0, xmmword, ptr, [rip, +, LCPI8_0] pand xmm3, xmm0 pcmpeqw xmm3, xmm0 pshufd xmm0, xmm3, 78 por xmm0, xmm3 pshufd xmm1, xmm0, 229 por xmm1, xmm0 movdqa xmm0, xmm1 psrld xmm0, 16 por xmm0, xmm1 movd eax, xmm0 and al, 1 pop rbp ret ``` After this PR for `m8x8`, `m16x4`, `m32x2`: ```asm all_8x8: push rbp mov rbp, rsp movq mm0, qword, ptr, [rdi] pmovmskb eax, mm0 cmp eax, 255 sete al pop rbp ret any_8x8: push rbp mov rbp, rsp movq mm0, qword, ptr, [rdi] pmovmskb eax, mm0 test eax, eax setne al pop rbp ret ``` x86` with `MMX` enabled Before this PR: ```asm all_8x8: call L9$pb L9$pb: pop eax mov ecx, dword, ptr, [esp, +, 4] movzx edx, byte, ptr, [ecx, +, 7] movd xmm0, edx movzx edx, byte, ptr, [ecx, +, 6] movd xmm1, edx punpcklwd xmm1, xmm0 movzx edx, byte, ptr, [ecx, +, 5] movd xmm0, edx movzx edx, byte, ptr, [ecx, +, 4] movd xmm2, edx punpcklwd xmm2, xmm0 punpckldq xmm2, xmm1 movzx edx, byte, ptr, [ecx, +, 3] movd xmm0, edx movzx edx, byte, ptr, [ecx, +, 2] movd xmm1, edx punpcklwd xmm1, xmm0 movzx edx, byte, ptr, [ecx, +, 1] movd xmm0, edx movzx ecx, byte, ptr, [ecx] movd xmm3, ecx punpcklwd xmm3, xmm0 punpckldq xmm3, xmm1 punpcklqdq xmm3, xmm2 movdqa xmm0, xmmword, ptr, [eax, +, LCPI9_0-L9$pb] pand xmm3, xmm0 pcmpeqw xmm3, xmm0 pshufd xmm0, xmm3, 78 pand xmm0, xmm3 pshufd xmm1, xmm0, 229 pand xmm1, xmm0 movdqa xmm0, xmm1 psrld xmm0, 16 pand xmm0, xmm1 movd eax, xmm0 and al, 1 ret any_8x8: call L8$pb L8$pb: pop eax mov ecx, dword, ptr, [esp, +, 4] movzx edx, byte, ptr, [ecx, +, 7] movd xmm0, edx movzx edx, byte, ptr, [ecx, +, 6] movd xmm1, edx punpcklwd xmm1, xmm0 movzx edx, byte, ptr, [ecx, +, 5] movd xmm0, edx movzx edx, byte, ptr, [ecx, +, 4] movd xmm2, edx punpcklwd xmm2, xmm0 punpckldq xmm2, xmm1 movzx edx, byte, ptr, [ecx, +, 3] movd xmm0, edx movzx edx, byte, ptr, [ecx, +, 2] movd xmm1, edx punpcklwd xmm1, xmm0 movzx edx, byte, ptr, [ecx, +, 1] movd xmm0, edx movzx ecx, byte, ptr, [ecx] movd xmm3, ecx punpcklwd xmm3, xmm0 punpckldq xmm3, xmm1 punpcklqdq xmm3, xmm2 movdqa xmm0, xmmword, ptr, [eax, +, LCPI8_0-L8$pb] pand xmm3, xmm0 pcmpeqw xmm3, xmm0 pshufd xmm0, xmm3, 78 por xmm0, xmm3 pshufd xmm1, xmm0, 229 por xmm1, xmm0 movdqa xmm0, xmm1 psrld xmm0, 16 por xmm0, xmm1 movd eax, xmm0 and al, 1 ret ``` After this PR: ```asm all_8x8: mov eax, dword, ptr, [esp, +, 4] movq mm0, qword, ptr, [eax] pmovmskb eax, mm0 cmp eax, 255 sete al ret any_8x8: mov eax, dword, ptr, [esp, +, 4] movq mm0, qword, ptr, [eax] pmovmskb eax, mm0 test eax, eax setne al ret ``` `aarch64` Before this PR: ```asm all_8x8: ldr d0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] tst w8, #0xff umov w10, v0.b[2] cset w8, ne tst w9, #0xff cset w9, ne tst w10, #0xff umov w10, v0.b[3] and w8, w8, w9 cset w9, ne tst w10, #0xff umov w10, v0.b[4] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[5] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[6] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[7] and w8, w9, w8 cset w9, ne tst w10, #0xff and w8, w9, w8 cset w9, ne and w0, w9, w8 ret any_8x8: ldr d0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] orr w8, w8, w9 umov w9, v0.b[2] orr w8, w8, w9 umov w9, v0.b[3] orr w8, w8, w9 umov w9, v0.b[4] orr w8, w8, w9 umov w9, v0.b[5] orr w8, w8, w9 umov w9, v0.b[6] orr w8, w8, w9 umov w9, v0.b[7] orr w8, w8, w9 tst w8, #0xff cset w0, ne ret ``` After this PR: ```asm all_8x8: ldr d0, [x0] mov v0.d[1], v0.d[0] uminv b0, v0.16b fmov w8, s0 tst w8, #0xff cset w0, ne ret any_8x8: ldr d0, [x0] mov v0.d[1], v0.d[0] umaxv b0, v0.16b fmov w8, s0 tst w8, #0xff cset w0, ne ret ``` `ARMv7` + `neon` Before this PR: ```asm all_8x8: vmov.i8 d0, #0x1 vldr d1, [r0] vtst.8 d0, d1, d0 vext.8 d1, d0, d0, #4 vand d0, d0, d1 vext.8 d1, d0, d0, #2 vand d0, d0, d1 vdup.8 d1, d0[1] vand d0, d0, d1 vmov.u8 r0, d0[0] and r0, r0, #1 bx lr any_8x8: vmov.i8 d0, #0x1 vldr d1, [r0] vtst.8 d0, d1, d0 vext.8 d1, d0, d0, #4 vorr d0, d0, d1 vext.8 d1, d0, d0, #2 vorr d0, d0, d1 vdup.8 d1, d0[1] vorr d0, d0, d1 vmov.u8 r0, d0[0] and r0, r0, #1 bx lr ``` After this PR: ```asm all_8x8: vldr d0, [r0] b <m8x8 as All>::all <m8x8 as All>::all: vpmin.u8 d16, d0, d16 vpmin.u8 d16, d16, d16 vpmin.u8 d0, d16, d16 b m8x8::extract any_8x8: vldr d0, [r0] b <m8x8 as Any>::any <m8x8 as Any>::any: vpmax.u8 d16, d0, d16 vpmax.u8 d16, d16, d16 vpmax.u8 d0, d16, d16 b m8x8::extract ``` (note: inlining does not work properly on ARMv7) 128-bit wide mask types (`m8x16`, `m16x8`, `m32x4`, `m64x2`) `x86_64` with SSE2 enabled Before this PR: ```asm all_8x16: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rip, +, LCPI9_0] movdqa xmm1, xmmword, ptr, [rdi] pand xmm1, xmm0 pcmpeqb xmm1, xmm0 pmovmskb eax, xmm1 xor ecx, ecx cmp eax, 65535 mov eax, -1 cmovne eax, ecx and al, 1 pop rbp ret any_8x16: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rip, +, LCPI8_0] movdqa xmm1, xmmword, ptr, [rdi] pand xmm1, xmm0 pcmpeqb xmm1, xmm0 pmovmskb eax, xmm1 neg eax sbb eax, eax and al, 1 pop rbp ret ``` After this PR: ```asm all_8x16: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rdi] pmovmskb eax, xmm0 cmp eax, 65535 sete al pop rbp ret any_8x16: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rdi] pmovmskb eax, xmm0 test eax, eax setne al pop rbp ret ``` `aarch64` Before this PR: ```asm all_8x16: ldr q0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] tst w8, #0xff umov w10, v0.b[2] cset w8, ne tst w9, #0xff cset w9, ne tst w10, #0xff umov w10, v0.b[3] and w8, w8, w9 cset w9, ne tst w10, #0xff umov w10, v0.b[4] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[5] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[6] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[7] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[8] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[9] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[10] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[11] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[12] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[13] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[14] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[15] and w8, w9, w8 cset w9, ne tst w10, #0xff and w8, w9, w8 cset w9, ne and w0, w9, w8 ret any_8x16: ldr q0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] orr w8, w8, w9 umov w9, v0.b[2] orr w8, w8, w9 umov w9, v0.b[3] orr w8, w8, w9 umov w9, v0.b[4] orr w8, w8, w9 umov w9, v0.b[5] orr w8, w8, w9 umov w9, v0.b[6] orr w8, w8, w9 umov w9, v0.b[7] orr w8, w8, w9 umov w9, v0.b[8] orr w8, w8, w9 umov w9, v0.b[9] orr w8, w8, w9 umov w9, v0.b[10] orr w8, w8, w9 umov w9, v0.b[11] orr w8, w8, w9 umov w9, v0.b[12] orr w8, w8, w9 umov w9, v0.b[13] orr w8, w8, w9 umov w9, v0.b[14] orr w8, w8, w9 umov w9, v0.b[15] orr w8, w8, w9 tst w8, #0xff cset w0, ne ret ``` After this PR: ```asm all_8x16: ldr q0, [x0] uminv b0, v0.16b fmov w8, s0 tst w8, #0xff cset w0, ne ret any_8x16: ldr q0, [x0] umaxv b0, v0.16b fmov w8, s0 tst w8, #0xff cset w0, ne ret ``` `ARMv7` + `neon` Before this PR: ```asm all_8x16: vmov.i8 q0, #0x1 vld1.64 {d2, d3}, [r0] vtst.8 q0, q1, q0 vext.8 q1, q0, q0, #8 vand q0, q0, q1 vext.8 q1, q0, q0, #4 vand q0, q0, q1 vext.8 q1, q0, q0, #2 vand q0, q0, q1 vdup.8 q1, d0[1] vand q0, q0, q1 vmov.u8 r0, d0[0] and r0, r0, #1 bx lr any_8x16: vmov.i8 q0, #0x1 vld1.64 {d2, d3}, [r0] vtst.8 q0, q1, q0 vext.8 q1, q0, q0, #8 vorr q0, q0, q1 vext.8 q1, q0, q0, #4 vorr q0, q0, q1 vext.8 q1, q0, q0, #2 vorr q0, q0, q1 vdup.8 q1, d0[1] vorr q0, q0, q1 vmov.u8 r0, d0[0] and r0, r0, #1 bx lr ``` After this PR: ```asm all_8x16: vld1.64 {d0, d1}, [r0] b <m8x16 as All>::all <m8x16 as All>::all: vpmin.u8 d0, d0, d b <m8x8 as All>::all any_8x16: vld1.64 {d0, d1}, [r0] b <m8x16 as Any>::any <m8x16 as Any>::any: vpmax.u8 d0, d0, d1 b <m8x8 as Any>::any ``` The inlining problems are pretty bad on ARMv7 + NEON. 256-bit wide mask types (`m8x32`, `m16x16`, `m32x8`, `m64x4`) With SSE2 enabled Before this PR: ```asm all_8x32: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rip, +, LCPI17_0] movdqa xmm1, xmmword, ptr, [rdi] pand xmm1, xmm0 movdqa xmm2, xmmword, ptr, [rdi, +, 16] pand xmm2, xmm0 pcmpeqb xmm2, xmm0 pcmpeqb xmm1, xmm0 pand xmm1, xmm2 pmovmskb eax, xmm1 xor ecx, ecx cmp eax, 65535 mov eax, -1 cmovne eax, ecx and al, 1 pop rbp ret any_8x32: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rdi] por xmm0, xmmword, ptr, [rdi, +, 16] movdqa xmm1, xmmword, ptr, [rip, +, LCPI16_0] pand xmm0, xmm1 pcmpeqb xmm0, xmm1 pmovmskb eax, xmm0 neg eax sbb eax, eax and al, 1 pop rbp ret ``` After this PR: ```asm all_8x32: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rdi] pmovmskb eax, xmm0 cmp eax, 65535 jne LBB17_1 movdqa xmm0, xmmword, ptr, [rdi, +, 16] pmovmskb ecx, xmm0 mov al, 1 cmp ecx, 65535 je LBB17_3 LBB17_1: xor eax, eax LBB17_3: pop rbp ret any_8x32: push rbp mov rbp, rsp movdqa xmm0, xmmword, ptr, [rdi] pmovmskb ecx, xmm0 mov al, 1 test ecx, ecx je LBB16_1 pop rbp ret LBB16_1: movdqa xmm0, xmmword, ptr, [rdi, +, 16] pmovmskb eax, xmm0 test eax, eax setne al pop rbp ret ``` With AVX enabled Before this PR: ```asm all_8x32: push rbp mov rbp, rsp vmovaps ymm0, ymmword, ptr, [rdi] vandps ymm0, ymm0, ymmword, ptr, [rip, +, LCPI25_0] vextractf128 xmm1, ymm0, 1 vpxor xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm3, xmm3, xmm3 vpxor xmm1, xmm1, xmm3 vpcmpeqb xmm0, xmm0, xmm2 vpxor xmm0, xmm0, xmm3 vinsertf128 ymm0, ymm0, xmm1, 1 vandps ymm0, ymm0, ymm1 vpermilps xmm1, xmm0, 78 vandps ymm0, ymm0, ymm1 vpermilps xmm1, xmm0, 229 vandps ymm0, ymm0, ymm1 vpsrld xmm1, xmm0, 16 vandps ymm0, ymm0, ymm1 vpsrlw xmm1, xmm0, 8 vandps ymm0, ymm0, ymm1 vpextrb eax, xmm0, 0 and al, 1 pop rbp vzeroupper ret any_8x32: push rbp mov rbp, rsp vmovaps ymm0, ymmword, ptr, [rdi] vandps ymm0, ymm0, ymmword, ptr, [rip, +, LCPI24_0] vextractf128 xmm1, ymm0, 1 vpxor xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm3, xmm3, xmm3 vpxor xmm1, xmm1, xmm3 vpcmpeqb xmm0, xmm0, xmm2 vpxor xmm0, xmm0, xmm3 vinsertf128 ymm0, ymm0, xmm1, 1 vorps ymm0, ymm0, ymm1 vpermilps xmm1, xmm0, 78 vorps ymm0, ymm0, ymm1 vpermilps xmm1, xmm0, 229 vorps ymm0, ymm0, ymm1 vpsrld xmm1, xmm0, 16 vorps ymm0, ymm0, ymm1 vpsrlw xmm1, xmm0, 8 vorps ymm0, ymm0, ymm1 vpextrb eax, xmm0, 0 and al, 1 pop rbp vzeroupper ret ``` After this PR: ```asm all_8x32: push rbp mov rbp, rsp vmovdqa ymm0, ymmword, ptr, [rdi] vxorps xmm1, xmm1, xmm1 vcmptrueps ymm1, ymm1, ymm1 vptest ymm0, ymm1 setb al pop rbp vzeroupper ret any_8x32: push rbp mov rbp, rsp vmovdqa ymm0, ymmword, ptr, [rdi] vptest ymm0, ymm0 setne al pop rbp vzeroupper ret ``` --- Closes #362 . * test avx on all x86 targets * disable assert_instr on avx test * enable all appropriate features * disable assert_instr on x86+avx * the fn_must_use is stable * fix nbody example on armv7 * fixup * fixup * enable 64-bit wide mask MMX optimizations on x86_64 only * remove coresimd dependency on cfg_if * allow wasm to fail * use an env variable to disable assert_instr tests * disable m32x2 mask MMX optimization on macos * move cfg_if to coresimd/macros.rs	2018-05-04 16:03:45 -05:00

1 2

70 Commits