1953 Commits

Author SHA1 Message Date
Madhav Madhusoodanan
08dda1502d fix: update arch flags being sent to the x86 compilation command 2025-10-26 17:48:20 +05:30
Madhav Madhusoodanan
6264634a73 feat: implement print_result_c for Intrinsic<X86IntrinsicType> 2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
d54464ab87 feat: implemented compare_outputs of x86 module 2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
962dcfd7b1 feat: implemented build_rust_file of x86 module 2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
8deed38593 chore: added Regex crate, updated the structure of X86IntrinsicType
struct
2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
e6d4838de7 fix: code cleanup 2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
9e8b542723 feat: update building C code for x86 architecture.
Notes: 1. chunk_info has been moved to `common/mod.rs` since it will be
needed for all architectures
2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
9eb0ff4296 feat: updated intrinsics creation 2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
1f9a2e7d46 feat: added the XML intrinsic parser for x86 2025-10-26 17:47:48 +05:30
Madhav Madhusoodanan
f44a98a59d feat: added the skeleton structure of the x86 module 2025-10-26 17:47:48 +05:30
Folkert de Vries
cf1cf2e94d
remove a use of core::intrinsics::size_of
use of the intrinsic, rather than the stable function, is probably an accident.
2025-10-25 23:57:17 +02:00
Amanieu d'Antras
d64b23c061
Merge pull request #1945 from folkertdev/gfni-cleanup
use `byte_add` in gfni tests
2025-10-25 14:17:49 +00:00
Folkert de Vries
9ebee4853d
use byte_add in gfni tests 2025-10-25 01:55:37 +02:00
Folkert de Vries
8dff65f010
Merge pull request #1938 from linkmauve/fjcvtzs
Implement fjcvtzs under the name __jcvt like the C intrinsic
2025-10-10 14:13:13 +00:00
Emmanuel Gil Peyrot
6039ddea09 Implement fjcvtzs under the name __jcvt like the C intrinsic
This instruction is only available when the jsconv target_feature is available,
so on ARMv8.3 or higher.

It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by
any JS engine probably.

I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the
FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of
the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or
stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely
arbitrary and I guess it could be argued one way or another.  I wouldn’t expect
it to stay unstable for too long, so ultimately this shouldn’t matter much.

This feature is now tracked in this issue[1].

[0] https://github.com/ruffle-rs/ruffle/pull/21780
[1] https://github.com/rust-lang/rust/issues/147555
2025-10-10 13:29:42 +00:00
Sayantan Chakraborty
01dc34d709
Merge pull request #1939 from folkertdev/crc-remove-not-arm
crc32: remove `#[cfg(not(target_arch = "arm"))]` from aarch64 crc functions
2025-10-09 17:37:09 +00:00
Folkert de Vries
4fcf3f86c4
crc32: remove #[cfg(not(target_arch = "arm"))] from crc functions
They are defined in the aarch64 module, so this cfg is pointless.

Note that these instructions do exist for arm, but the aarch64 ones are
already stable, so this would need some additional work to implement
them for arm.
2025-10-09 19:20:20 +02:00
Folkert de Vries
27866a7f06
Merge pull request #1937 from sayantn/intrinsic-fixes
use simd intrinsics for `vec_max` and `vec_min`
2025-10-08 11:17:58 +00:00
sayantn
40ce617b2a
use simd intrinsics for vec_max and vec_min 2025-10-08 16:01:08 +05:30
Tsukasa OI
af91b45726 RISC-V: Use symbolic instructions on inline assembly (part 1)
While many intrinsics use `.insn` to generate raw machine code from
numbers, all ratified instructions can be symbolic
using `.option` directives.

By saving the assembler environment with `.option push` then modifying
the architecture with `.option arch`, we can temporarily enable certain
extensions (as we use `.option pop` immediately after the target
instruction, surrounding environment is completely intact in this
commit; *almost* completely intact in general).

This commit modifies the `pause` *hint* intrinsic to use symbolic
*instruction* because we want to expose it even if the Zihintpause
extension is unavailable on the target.
2025-10-06 01:08:42 +00:00
Amanieu d'Antras
09c43ef6d3
Merge pull request #1929 from sayantn/non-temporal
Fixes for non-temporal intrinsics
2025-10-05 22:44:09 +00:00
sayantn
c0e41518d1
Add comments in NT asm blocks for future reference 2025-10-05 07:04:36 +05:30
sayantn
5bf53654c5
Add _mm_sfence to all non-temporal intrinsic tests 2025-10-05 06:56:49 +05:30
sayantn
b29308c167
Use Inline ASM for SSE4a nontemporal stores 2025-10-05 06:56:46 +05:30
sayantn
28cf2d1a6c
Fix xsave segfaults 2025-10-05 05:39:29 +05:30
Sayantan Chakraborty
7e850c5f1e
Merge pull request #1932 from sayantn/fmaddsub
Use SIMD intrinsics for `vfmaddsubph` and `vfmsubaddph`
2025-10-04 00:43:02 +00:00
Amanieu d'Antras
14b888574f
Merge pull request #1931 from sayantn/use-intrinsics
Fix mistake in #1928
2025-10-03 13:10:34 +00:00
sayantn
f90d9ec8b2
Use SIMD intrinsics for vfmaddsubph and vfmsubaddph 2025-10-03 05:33:13 +05:30
sayantn
37605b03c5
Ensure simd_funnel_sh{l,r} always gets passed shift amounts in range 2025-10-03 03:51:34 +05:30
sayantn
018f9927b2
Revert uses of SIMD intrinsics for shifts 2025-10-03 03:30:50 +05:30
Madhav Madhusoodanan
6b99d5fb56 fix: update the implementation of _kshiftri_mask16 and _kshiftli_mask16
to zero out when the amount of shift exceeds 16.
2025-10-03 02:33:11 +05:30
Madhav Madhusoodanan
0138b95620 fix: update the implementation of _kshiftri_mask8 and _kshiftli_mask8 to
zero out when the amount of shift exceeds the bit length of the input
argument.
2025-10-03 02:27:15 +05:30
Madhav Madhusoodanan
8b25ddeea3 fix: update the implementation of _kshiftri_mask32, _kshiftri_mask64,
_kshiftli_mask32 and _kshiftli_mask64 to zero out when the amount of
shift exceeds the bit length of the input argument.
2025-10-03 02:20:50 +05:30
sayantn
851c32abb2
Use SIMD intrinsics for test{z,c} intrinsics 2025-10-01 12:33:41 +05:30
sayantn
4c94e6bba9
Use SIMD intrinsics for vperm2 intrinsics 2025-10-01 10:26:59 +05:30
sayantn
d23dbbec31
Use SIMD intrinsics for cvtsi{,64}_{ss,sd} intrinsics 2025-10-01 07:23:43 +05:30
sayantn
6460b35798
Use SIMD intrinsics for f16 intrinsics 2025-10-01 07:23:10 +05:30
sayantn
3f91ced840
Use SIMD intrinsics for shift and rotate intrinsics 2025-10-01 07:22:12 +05:30
sayantn
1819ae0c1f
Use SIMD intrinsics for madd, hadd and hsub intrinsics 2025-10-01 07:20:30 +05:30
sayantn
b55b085535
Remove uses of deprecated llvm.x86.addcarryx.u{32,64} intrinsics
- Correct mistake in x86_64/adx.rs where it was not testing `_addcarryx` at all
2025-10-01 07:16:44 +05:30
usamoi
00c8866c57 pick changes from https://github.com/rust-lang/rust/pull/146683 2025-09-23 10:17:54 +08:00
usamoi
3b09522c34 Revert "Remove big-endian swizzles from vreinterpret"
This reverts commit 24f89ca53d3374ed8d3e0cbadc1dc89eea41acba.
2025-09-23 10:05:32 +08:00
usamoi
39b2e433e6 intrinsic-test: test intrinsics with patched core_arch 2025-09-20 20:13:24 +08:00
Sayantan Chakraborty
c1242fab74
Merge pull request #1921 from a4lg/riscv-inline-asm-general-improvements
RISC-V: Improvements of inline assembly uses
2025-09-15 18:39:49 +00:00
Folkert de Vries
5dd0fdcd67
Merge pull request #1919 from sayantn/fix-vreinterpret
Remove big-endian swizzles from `vreinterpret`
2025-09-15 08:18:20 +00:00
Tsukasa OI
8df078a3f0 RISC-V: Improvements of inline assembly uses
This commit performs various improvements (better register allocation,
less register clobbering on the worst case and better readability) of
RISC-V inline assembly use cases.

Note that it does not change the `p` module (which defines the "P"
extension draft instructions but very likely to change).

1.  Use `lateout` as possible.
    Unlike `out(reg)` and `in(reg)` pair, `lateout(reg)` and `in(reg)`
    can share the same register because they state that the late-output
    register is written after all the reads are performed.
    It can improve register allocation.
2.  Add `preserves_flags` option as possible.
    While RISC-V doesn't have _regular_ condition codes, RISC-V inline
    assembly in the Rust language assumes that some registers
    (mainly vector state registers) may be overwritten by default.
    By adding `preserves_flags` to the intrinsics corresponding
    instructions without overwriting them, it can minimize register
    clobbering on the worst case.
3.  Use trailing semicolon.
    As `asm!` declares an action and it doesn't return a value by
    itself, it would be better to have trailing semicolon to denote that
    an `asm!` call is effectively a statement.
4.  Make most of `asm!` calls multi-lined.
    `rustfmt` makes some simple (yet long) `asm!` calls multi-lined but
    it does not perform formatting of complex `asm!` calls with inputs
    and/or outputs.  To keep consistency, it makes most of the `asm!`
    calls multi-lined.
2025-09-14 05:08:19 +00:00
Tsukasa OI
a3b7aad20f stdarch-gen-arm: Make Clippy happy 2025-09-12 11:50:51 +00:00
Tsukasa OI
221eb1f0d5 intrinsic-test: Make Clippy happy 2025-09-12 11:50:25 +00:00
Sayantan Chakraborty
269cecc91c
Merge pull request #1918 from a4lg/riscv-aes64im-lower-requirements
RISC-V: "Lower" requirements of `aes64im`
2025-09-11 19:59:18 +00:00
sayantn
bb31725e67
Remove big-endian swizzles from vreinterpret 2025-09-12 01:20:34 +05:30