Feature dependencies for newer aarch64 fetaures differ between LLVM 18
in the Rust tree and upstream LLVM 19.
This commit updates those dependencies to reflect new LLVM upstream
changes.
Add detection for SME features supported by LLVM and the Linux Kernel.
Include commented-out hwcap fields for features supported by Linux but not by LLVM.
This commit adds feature detection for the following features:
- FEAT_SME
- FEAT_SME_F16F16
- FEAT_SME_F64F64
- FEAT_SME_F8F16
- FEAT_SME_F8F32
- FEAT_SME_FA64
- FEAT_SME_I16I64
- FEAT_SME_LUTv2
- FEAT_SME2
- FEAT_SME2p1
- FEAT_SSVE_FP8DOT2
- FEAT_SSVE_FP8DOT4
- FEAT_SSVE_FP8FMA
Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
Add detection for various aarch64 CPU features already supported by LLVM and Linux.
This commit adds feature detection for the following features:
- FEAT_CSSC
- FEAT_ECV
- FEAT_FAMINMAX
- FEAT_FLAGM2
- FEAT_FP8
- FEAT_FP8DOT2
- FEAT_FP8DOT4
- FEAT_FP8FMA
- FEAT_HBC
- FEAT_LSE128
- FEAT_LUT
- FEAT_MOPS
- FEAT_LRCPC3
- FEAT_SVE_B16B16
- FEAT_SVE2p1
- FEAT_WFxT
It also adds feature detection for FEAT_FPMR. It is somewhat of a
special case because FPMR only exists as a feature in LLVM 18, it has
been removed from the LLVM upstream. On that account the intention is
for it to be detectable at runtime through stdarch but not have a
corresponding compile-time Rust target feature.
Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
It looks like LLVM requires that `simd128` is active to use these
intrinsics and `relaxed-simd` isn't implicitly enabling them. This is
probably something to fix at the LLVM layer as well but for now enable
both the `simd128` feature as well as the `relaxed-simd` feature to fix
things on our side.
`_mm512_kunpackb` was implemented wrong, and `simd_reduce_max` uses `maxnum` for comparison, which adheres to IEEE754, but Intel specifically says that they do NOT adhere to IEEE754 for NaNs, which can give wrong results