sayantn
28cf2d1a6c
Fix xsave segfaults
2025-10-05 05:39:29 +05:30
Sayantan Chakraborty
7e850c5f1e
Merge pull request #1932 from sayantn/fmaddsub
...
Use SIMD intrinsics for `vfmaddsubph` and `vfmsubaddph`
2025-10-04 00:43:02 +00:00
Amanieu d'Antras
14b888574f
Merge pull request #1931 from sayantn/use-intrinsics
...
Fix mistake in #1928
2025-10-03 13:10:34 +00:00
sayantn
f90d9ec8b2
Use SIMD intrinsics for vfmaddsubph and vfmsubaddph
2025-10-03 05:33:13 +05:30
sayantn
37605b03c5
Ensure simd_funnel_sh{l,r} always gets passed shift amounts in range
2025-10-03 03:51:34 +05:30
sayantn
018f9927b2
Revert uses of SIMD intrinsics for shifts
2025-10-03 03:30:50 +05:30
Madhav Madhusoodanan
6b99d5fb56
fix: update the implementation of _kshiftri_mask16 and _kshiftli_mask16
...
to zero out when the amount of shift exceeds 16.
2025-10-03 02:33:11 +05:30
Madhav Madhusoodanan
0138b95620
fix: update the implementation of _kshiftri_mask8 and _kshiftli_mask8 to
...
zero out when the amount of shift exceeds the bit length of the input
argument.
2025-10-03 02:27:15 +05:30
Madhav Madhusoodanan
8b25ddeea3
fix: update the implementation of _kshiftri_mask32, _kshiftri_mask64,
...
_kshiftli_mask32 and _kshiftli_mask64 to zero out when the amount of
shift exceeds the bit length of the input argument.
2025-10-03 02:20:50 +05:30
sayantn
851c32abb2
Use SIMD intrinsics for test{z,c} intrinsics
2025-10-01 12:33:41 +05:30
sayantn
4c94e6bba9
Use SIMD intrinsics for vperm2 intrinsics
2025-10-01 10:26:59 +05:30
sayantn
d23dbbec31
Use SIMD intrinsics for cvtsi{,64}_{ss,sd} intrinsics
2025-10-01 07:23:43 +05:30
sayantn
6460b35798
Use SIMD intrinsics for f16 intrinsics
2025-10-01 07:23:10 +05:30
sayantn
3f91ced840
Use SIMD intrinsics for shift and rotate intrinsics
2025-10-01 07:22:12 +05:30
sayantn
1819ae0c1f
Use SIMD intrinsics for madd, hadd and hsub intrinsics
2025-10-01 07:20:30 +05:30
sayantn
b55b085535
Remove uses of deprecated llvm.x86.addcarryx.u{32,64} intrinsics
...
- Correct mistake in x86_64/adx.rs where it was not testing `_addcarryx` at all
2025-10-01 07:16:44 +05:30
usamoi
00c8866c57
pick changes from https://github.com/rust-lang/rust/pull/146683
2025-09-23 10:17:54 +08:00
usamoi
3b09522c34
Revert "Remove big-endian swizzles from vreinterpret"
...
This reverts commit 24f89ca53d3374ed8d3e0cbadc1dc89eea41acba.
2025-09-23 10:05:32 +08:00
Sayantan Chakraborty
c1242fab74
Merge pull request #1921 from a4lg/riscv-inline-asm-general-improvements
...
RISC-V: Improvements of inline assembly uses
2025-09-15 18:39:49 +00:00
Folkert de Vries
5dd0fdcd67
Merge pull request #1919 from sayantn/fix-vreinterpret
...
Remove big-endian swizzles from `vreinterpret`
2025-09-15 08:18:20 +00:00
Tsukasa OI
8df078a3f0
RISC-V: Improvements of inline assembly uses
...
This commit performs various improvements (better register allocation,
less register clobbering on the worst case and better readability) of
RISC-V inline assembly use cases.
Note that it does not change the `p` module (which defines the "P"
extension draft instructions but very likely to change).
1. Use `lateout` as possible.
Unlike `out(reg)` and `in(reg)` pair, `lateout(reg)` and `in(reg)`
can share the same register because they state that the late-output
register is written after all the reads are performed.
It can improve register allocation.
2. Add `preserves_flags` option as possible.
While RISC-V doesn't have _regular_ condition codes, RISC-V inline
assembly in the Rust language assumes that some registers
(mainly vector state registers) may be overwritten by default.
By adding `preserves_flags` to the intrinsics corresponding
instructions without overwriting them, it can minimize register
clobbering on the worst case.
3. Use trailing semicolon.
As `asm!` declares an action and it doesn't return a value by
itself, it would be better to have trailing semicolon to denote that
an `asm!` call is effectively a statement.
4. Make most of `asm!` calls multi-lined.
`rustfmt` makes some simple (yet long) `asm!` calls multi-lined but
it does not perform formatting of complex `asm!` calls with inputs
and/or outputs. To keep consistency, it makes most of the `asm!`
calls multi-lined.
2025-09-14 05:08:19 +00:00
Sayantan Chakraborty
269cecc91c
Merge pull request #1918 from a4lg/riscv-aes64im-lower-requirements
...
RISC-V: "Lower" requirements of `aes64im`
2025-09-11 19:59:18 +00:00
sayantn
bb31725e67
Remove big-endian swizzles from vreinterpret
2025-09-12 01:20:34 +05:30
Tsukasa OI
e54cc43867
RISC-V: "Lower" requirements of aes64im
...
This instruction is incorrectly categorized as the same one as
`aes64ks1i` and `aes64ks2` (that should require `zkne || zknd` but
currently require `zkne && zknd`) but `aes64im` only requires
the Zknd extension.
This commit fixes the category of this intrinsic (lowering the
requirements from the Rust perspective but it does not actually lower
it from the RISC-V perspective).
2025-09-11 06:42:10 +00:00
WANG Rui
614dab3ed2
loongarch: Align intrinsic signatures with LLVM
2025-09-10 23:10:19 +08:00
Folkert de Vries
93101b5783
s390x: use the new u128::funnel_shl
2025-09-06 14:32:36 +02:00
Folkert de Vries
e1a3b8bdc1
Merge pull request #1911 from nikic/remove-hack
...
Remove some llvm workarounds
2025-09-03 13:16:03 +00:00
Tsukasa OI
4679533732
RISC-V: Lower requirements of clmul and clmulh
...
They don't need full "Zbc" extension but only its subset: the "Zbkc"
extension. Since the compiler implies `zbkc` from `zbc`, it's safe to
use `#[target_feature(enable = "zbkc")]`.
2025-09-03 02:13:35 +00:00
Nikita Popov
18fa6d917c
Remove some llvm workarounds
2025-09-02 10:48:42 +02:00
Folkert de Vries
ae648be783
use llvm.roundeven on arm
2025-08-29 12:15:41 +02:00
Amanieu d'Antras
b2189b8ff6
Merge pull request #1903 from folkertdev/s390x-llvm-21-fixes
...
`s390x` llvm 21 improvements
2025-08-21 20:31:06 +00:00
Folkert de Vries
98bd1d7445
use simd_saturating_{add, sub} on neon
2025-08-21 10:25:00 +02:00
Amanieu d'Antras
0b0c42478f
Merge pull request #1901 from folkertdev/wasm-read-unaligned
...
wasm: use `{read, write}_unaligned` methods
2025-08-20 20:44:05 +00:00
Folkert de Vries
6d74280ae4
Merge pull request #1899 from dpaoliello/arm64ec
...
Add testing for Arm64EC Windows
2025-08-20 20:42:51 +00:00
Folkert de Vries
45af206618
s390x: link to a missed optimization
2025-08-20 22:20:30 +02:00
Folkert de Vries
e9162f221a
s390x: implement vec_sld using fshl
2025-08-20 22:20:30 +02:00
Folkert de Vries
dfa95c6fa4
s390x: implement vec_subc_u128 using overflowing_sub
2025-08-20 22:20:29 +02:00
Folkert de Vries
e1a1b1ded2
s390x: implement vec_mulo using core::intrinsics::simd
2025-08-20 22:20:28 +02:00
Folkert de Vries
d5cb1c49fa
wasm: use {read, write}_unaligned methods
2025-08-20 22:11:32 +02:00
Folkert de Vries
1cda88aca1
s390x: implement vec_mule using core::intrinsics::simd
2025-08-20 22:11:16 +02:00
Folkert de Vries
97d64665b9
s390x: add assert_instr for vec_extend
2025-08-20 22:11:16 +02:00
Folkert de Vries
c5ec0960f0
s390x: add assert_instr for vec_round
2025-08-20 22:11:16 +02:00
Folkert de Vries
fa163a1fca
s390x: define unpack_low using core::intrinsics::simd
2025-08-20 22:11:15 +02:00
Nikita Popov
3302e3e09a
Adjust immediate for vrndscalepd tests
...
The immediate here encodes both the rounding mode (in the low bits)
and the scale (in the high bits). Make sure the scale is non-zero.
2025-08-20 11:23:46 +02:00
Nikita Popov
92f6310890
Work around selection failure without avx512vl
2025-08-20 11:23:46 +02:00
Nikita Popov
4a8b8231b1
Add missing avx512vl target features
2025-08-20 11:23:46 +02:00
Nikita Popov
135de7c8df
Use intrinsics for some s390x operations
2025-08-20 11:23:30 +02:00
Nikita Popov
f9bc63d78f
Drop no longer needed feature gates
2025-08-20 11:23:30 +02:00
Daniel Paoliello
f2c0c3dd44
Add testing for Arm64EC Windows
2025-08-10 13:19:06 -07:00
The rustc-josh-sync Cronjob Bot
49aa0ecc7b
Merge ref '32e7a4b92b10' from rust-lang/rust
...
Pull recent changes from https://github.com/rust-lang/rust via Josh.
Upstream ref: 32e7a4b92b109c24e9822c862a7c74436b50e564
Filtered ref: 56d8aa13f54944edb711f3bdd7013b082dbaa65b
This merge was created using https://github.com/rust-lang/josh-sync .
2025-07-31 04:20:38 +00:00