1350 Commits

Author SHA1 Message Date
sayantn
c878b773d5 AVX512FP16 Part 0: Types 2024-07-26 12:20:06 +01:00
daxpedda
a1ad6bf8be Move Wasm's relaxed SIMD to Rust v1.82 2024-07-25 16:38:08 +01:00
sayantn
74f53212a0 Stabilize simd_x86_updates 2024-07-25 16:07:35 +01:00
Yuri Astrakhan
dd87060bf3 Minor lints for stdarch-gen-arm/src/main.rs
Just a few minor cleanups
2024-07-25 15:41:21 +01:00
Kajetan Puchalski
351ec5744c std_detect: Update aarch64 feature dependencies to LLVM upstream
Feature dependencies for newer aarch64 fetaures differ between LLVM 18
in the Rust tree and upstream LLVM 19.
This commit updates those dependencies to reflect new LLVM upstream
changes.
2024-07-25 15:18:37 +01:00
Kajetan Puchalski
41dc17d3e5 std_detect: Sort aarch64 features
Alphabetically sort the list of aarch64 features.
The list was getting a bit too chaotic so it was worth properly
sorting.
2024-07-25 15:18:37 +01:00
Kajetan Puchalski
ef538bc614 std_detect: Add aarch64/linux/LLVM SME features
Add detection for SME features supported by LLVM and the Linux Kernel.
Include commented-out hwcap fields for features supported by Linux but not by LLVM.

This commit adds feature detection for the following features:

- FEAT_SME
- FEAT_SME_F16F16
- FEAT_SME_F64F64
- FEAT_SME_F8F16
- FEAT_SME_F8F32
- FEAT_SME_FA64
- FEAT_SME_I16I64
- FEAT_SME_LUTv2
- FEAT_SME2
- FEAT_SME2p1
- FEAT_SSVE_FP8DOT2
- FEAT_SSVE_FP8DOT4
- FEAT_SSVE_FP8FMA

Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
2024-07-25 15:18:37 +01:00
Kajetan Puchalski
dfc5dfc8ef std_detect: Add aarch64/linux/LLVM features
Add detection for various aarch64 CPU features already supported by LLVM and Linux.

This commit adds feature detection for the following features:

- FEAT_CSSC
- FEAT_ECV
- FEAT_FAMINMAX
- FEAT_FLAGM2
- FEAT_FP8
- FEAT_FP8DOT2
- FEAT_FP8DOT4
- FEAT_FP8FMA
- FEAT_HBC
- FEAT_LSE128
- FEAT_LUT
- FEAT_MOPS
- FEAT_LRCPC3
- FEAT_SVE_B16B16
- FEAT_SVE2p1
- FEAT_WFxT

It also adds feature detection for FEAT_FPMR. It is somewhat of a
special case because FPMR only exists as a feature in LLVM 18, it has
been removed from the LLVM upstream. On that account the intention is
for it to be detectable at runtime through stdarch but not have a
corresponding compile-time Rust target feature.

Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
2024-07-25 15:18:37 +01:00
sayantn
aa84427fd4 Use LLVM intrinsics for masked load/stores, expand-loads and fp-class
Also, remove some redundant sse target-features from avx intrinsics
2024-07-14 20:26:09 +01:00
daxpedda
ba9e8be05e Revert "wasm32: Add simd128 to enabled features for relaxed intrinsics" 2024-07-14 12:00:23 +02:00
sayantn
aa001c3f3e Some small refactorings
Use llvm intrinsics for `vfpclassss` and `vfpclasssd`
Use `simd_insert` for `x86_polyfill`
2024-07-12 18:12:30 +02:00
Alex Crichton
bb2b4293b9 wasm32: Add simd128 to enabled features for relaxed intrinsics
It looks like LLVM requires that `simd128` is active to use these
intrinsics and `relaxed-simd` isn't implicitly enabling them. This is
probably something to fix at the LLVM layer as well but for now enable
both the `simd128` feature as well as the `relaxed-simd` feature to fix
things on our side.
2024-07-11 17:26:52 +02:00
sayantn
f101974941 Added verification for doc comments 2024-07-08 00:32:43 +02:00
sayantn
1e8a22c374 Fix Documentation 2024-07-08 00:32:43 +02:00
sayantn
1da646fcab Implement missing in SSE4a and TBM
Add `extracti`, `inserti` and `bextri` intrinsics. Refactor TBM into 2 modules
2024-07-07 19:55:04 +02:00
Tobias Decking
7378b35fd0 Use generic simd in wasm intrinsics 2024-07-07 19:21:10 +02:00
sayantn
94153c46e9 Implemented runtime detection of xop target-feature 2024-07-06 18:55:26 +02:00
sayantn
d67ca1fe09 Added runtime detection
Cannot do a `cupid` test because they don't support `amx`.
2024-07-06 18:28:25 +02:00
Tobias Decking
bbb2ba5424 Refactor avx512bw: reduction operations 2024-07-06 12:07:29 +02:00
Tobias Decking
fe0a378499 Refactor avx512bw: mask operations 2024-07-06 12:07:29 +02:00
Tobias Decking
198a91e5db Refactor avx512bw: integer comparison 2024-07-06 12:07:29 +02:00
Tobias Decking
f1a1ec2921 Refactor avx512bw: max/min 2024-07-06 12:07:29 +02:00
Tobias Decking
9ad2a62245 Refactor avx512bw: saturating arithmetic 2024-07-06 12:07:29 +02:00
Tobias Decking
13063410dd Refactor avx512bw: avg + mulhi + abs 2024-07-06 12:07:29 +02:00
sayantn
268ac7fe92 Add detection for SHA512, SM3 and SM4
Cannot cross-verify with `cupid` because they do not have these features yet.
2024-07-06 11:29:28 +02:00
sayantn
c862e4e487 Added a bf16 type 2024-07-06 11:00:34 +02:00
sayantn
70fbc2e97c Implemented some missing functions
These cannot be linked with LLVM because of the lack of `bfloat16` and `i1` types in Rust. So, inline asm was the only way
2024-07-06 11:00:34 +02:00
sayantn
3de8e86491 Implemented the missing AVX512BF16 intrinsics 2024-07-06 11:00:34 +02:00
sayantn
f22fab559e Implemented VEX versions
Modified stdarch-test to accept VEX versions
2024-07-06 11:00:34 +02:00
sayantn
775dcaabde Implemented missing gather-scatters 2024-07-06 11:00:34 +02:00
sayantn
1c3b3b80c0 Fix the stream intrinsics
They should use a platform-specific address management.
2024-07-06 11:00:34 +02:00
Tobias Decking
1f3264848f Fix incorrect reduction operations in avx512f 2024-07-02 12:19:20 +02:00
sayantn
ed1df99f03 Added support for AMD verification
Added a custom cpuid file for sde, which enables SSE4a, XOP, TBM and VP2INTERSECT. Fixed `xsave` tests
2024-06-30 21:45:56 +02:00
sayantn
fd948ee99d Updates SDE
Updated SDE to v9.33.0
Disabled `assert-instr` in emulated run
2024-06-30 21:45:56 +02:00
Tobias Decking
fcee4d8b16 Define remaining IFMA intrinsics 2024-06-30 15:47:18 +02:00
Tobias Decking
a56cc86a23 Use generic simd for avx512 leading zeros 2024-06-30 15:17:50 +02:00
Tobias Decking
d1004e0abd Refactor avx512f: mask operations 2024-06-30 14:55:25 +02:00
Tobias Decking
9f96670b7c Refactor avx512f: element extraction 2024-06-30 14:55:25 +02:00
Tobias Decking
9a1d758f03 Refactor avx512f: floating point abs 2024-06-30 14:55:25 +02:00
Tobias Decking
2c81a7ae33 Refactor avx512f: zeroing primitives 2024-06-30 14:55:25 +02:00
Tobias Decking
f5219be7ee Refactor avx512f: integer comparison 2024-06-30 14:55:25 +02:00
Tobias Decking
883cedc230 Refactor avx512f: integers 2024-06-30 14:55:25 +02:00
Tobias Decking
0d9520dfd4 Refactor avx512f: sqrt + rounding fix 2024-06-30 14:55:25 +02:00
Tobias Decking
53ca30a4c8 Refactor avx512f: rounding fma 2024-06-30 14:55:25 +02:00
Tobias Decking
128866c97b Refactor avx512f: fma 2024-06-30 14:55:25 +02:00
Jubilee Young
8b77e779cb Remove has_cpuid 2024-06-29 19:38:42 +02:00
sayantn
d7ea407a28 Fixing CI
Fixed x86_64-apple-darwin freezing.
Bump all docker to Ubuntu-24.04 (except for emulated and armv7)
2024-06-29 19:16:48 +02:00
sayantn
818df2f7d0 Some fixes as asked by @Amanieu 2024-06-29 19:16:48 +02:00
sayantn
95d273aaf9 Fixed _mm512_kunpackb, reduce-max and reduce-min
`_mm512_kunpackb` was implemented wrong, and `simd_reduce_max` uses `maxnum` for comparison, which adheres to IEEE754, but Intel specifically says that they do NOT adhere to IEEE754 for NaNs, which can give wrong results
2024-06-29 19:16:48 +02:00
sayantn
fa22a9aeda Add the missing BMI1, SSE2, SSE4.1 and AVX2 intrinsics 2024-06-29 19:16:48 +02:00