85 Commits

Author SHA1 Message Date
Eduardo Sánchez Muñoz
c808ba4722 Remove unneeded transmutes in ARM code, except generated tests 2023-10-31 17:58:01 +01:00
Amanieu d'Antras
17daea9747 Update instruction tests for LLVM 17 2023-08-29 15:21:34 +02:00
Jacob Bramley
31e17e39c2 Add AArch64 vrnd*_f64 Neon intrinsics.
The LLVM intrinsic doesn't support float64x1_t, but the required
instruction is a scalar form (e.g. `frint32x <Dd>, <Dn>`), so we can
implement these using the scalar intrinsic.

Note that Clang does not support these intrinsics, so they aren't
covered by intrinsic-test. Additional validation is included in this
patch to ensure that we're selecting an instruction with the same
behaviour as the corresponding vector form (which all have
intrinsic-tests).
2023-06-21 18:52:21 +02:00
Jacob Bramley
0459405ea9 Add more AArch64 vrnd intrinsics.
LLVM can't select float64x1_t variants, but float64x2_t variants work.
2023-06-21 18:52:21 +02:00
Jacob Bramley
a9fecd8456 Support AArch32 Neon dotprod intrinsics.
Note that the feature detection requires a recent Linux kernel (v6.2).
2023-06-21 18:52:21 +02:00
Jacob Bramley
1e15fa3f0a Add support for AArch64 i8mm *dot intrinsics.
This includes vsudot and vusdot, which perform mixed-signedness dot
product operations.
2023-06-21 18:52:21 +02:00
Kisaragi Marine
7b6c185ae3 arm(neon): avoid snippets which triggers unused_paren lint 2023-06-20 00:47:34 +02:00
Jeroen Van Der Donckt
9778c39444 docs: fix unfinished vcgt documentation 2023-04-15 08:12:01 -07:00
Amanieu d'Antras
fbed7945aa Update intrinsic tests for LLVM 16 2023-03-30 15:43:03 +01:00
bwmf2
1c18225f32 Fix typo 2023-02-18 20:02:17 +01:00
Ralf Jung
5afa869e0a use inline const for last simd_shuffle argument 2023-01-10 00:23:14 +00:00
Amanieu d'Antras
e79701c56e Properly fix vext intrinsic tests
This was previously done as part of #1326, but it modified generated
code without fixing the root issue in neon.spec.
2022-08-22 23:46:15 +02:00
Jamie Cunliffe
e75d75e292
Add the rdm target feature to the sqrdmlsh intrinsic. (#1285) 2022-04-08 19:29:11 +01:00
Amanieu d'Antras
b25548658a Updates for LLVM 14 on nightly 2022-02-19 20:44:04 +00:00
Frank Steffahn
df24e2a0f8 Fix a bunch of typos 2021-12-14 10:17:43 -08:00
Amanieu d'Antras
937978eeef
Update the intrinsic checker tool (#1258) 2021-12-04 13:03:30 +00:00
Amanieu d'Antras
ca1f7cc1a6
Add missing vtst_p16 and vtstq_p16 intrinsics (#1257) 2021-11-20 20:51:37 +00:00
Sparrow Li
7c3bd04537
complete armv8 instructions (#1256) 2021-11-19 01:24:36 +00:00
Sparrow Li
be5e1be224
Add remaining insturctions (#1250)
* add vmmla vusmmla vsm4e vsm3 vrax1 vxar vsha512 vbcax veor3 neon instructions

* update runtime feature detect

* correct tests

* add `vrnd32x` `vrnd64x`

* add MISSING.md
2021-11-10 15:19:59 +00:00
Jamie Cunliffe
8d6f3f36b3
Correct the vqrdmlah intrinsics. (#1246) 2021-11-04 14:16:26 +00:00
Jamie Cunliffe
813530237d
Do not emit undefined lshr/ashr for Neon shifts (#1238) 2021-10-22 20:24:54 +01:00
Sparrow Li
9df48f1e57
Complete the remaining neon instructions (#1230) 2021-10-21 10:52:05 +01:00
Sparrow Li
68e35d306f
Complete vld* and vst* neon instructions (#1224) 2021-09-29 04:28:10 +01:00
Sparrow Li
bdea403c54
Complete vst1 neon instructions (#1221) 2021-09-24 13:26:29 +01:00
Hans Kratz
26cce19427
Make dedup guard optional (#1215) 2021-09-20 17:19:05 +01:00
Hans Kratz
504b0cf68b
Arm Fused Multiply-Add fixes (#1219) 2021-09-20 17:18:20 +01:00
Sparrow Li
328553ef64
Complete vld1 instructions with some corrections (#1216) 2021-09-18 14:13:24 +01:00
Sparrow Li
9e34c6d4c8
Add vst neon instructions (#1205)
* add vst neon instructions

* modify the instruction limit
2021-08-31 21:35:30 +01:00
Sparrow Li
4baf95fddd
add vldx neon instructions (#1200) 2021-08-24 19:51:30 +01:00
Jamie Cunliffe
0285e513e0 Update arm vcvt intrinsics to use llvm.fpto(su)i.sat
Those intrinsics have the correct semantics for the desired fcvtz instruction,
without any undefined behaviour. The previous simd_cast was undefined for
infinite and NaN which could cause issues.
2021-08-11 13:13:19 +01:00
Adam Gemmell
8cb8cd2142 Replace the crypto feature with aes in generated intrinsics for aarch64
This allows us to deprecate the crypto target_feature in favour of its
subfeatures.

We cannot do this yet for ARM targets as LLVM requires the crypto
feature. This was fixed in
b8baa2a913
2021-08-02 23:38:57 +01:00
Sparrow Li
10f7ebc387
Add vfma and vfms neon instructions (#1169) 2021-05-21 12:26:21 +01:00
Sparrow Li
15749b0ed3
Modify the implementation of d_s64 suffix instructions (#1167) 2021-05-19 03:43:53 +01:00
Sparrow Li
09a05e02f4
Add vmull_p64 and vmull_high_p64 for aarch64 (#1157) 2021-05-15 21:58:23 +01:00
Sparrow Li
4a21f4db0e
Add vqmovn neon instructions (#1163) 2021-05-14 12:32:58 +01:00
Ralf Jung
a34883b5d3
manually const-ify shuffle arguments (#1160) 2021-05-11 21:11:52 +01:00
SparrowLii
7516a80c31 Add vset neon instructions 2021-05-11 13:38:16 +01:00
SparrowLii
8a2936b9a2 Completion of vcvt neon instruction 2021-05-07 23:02:39 +01:00
SparrowLii
911ace84b2 Add vqrdmulh, vqrdmlah, vqrdmlsh neon instructions 2021-05-06 15:44:54 +01:00
Sparrow Li
fd29f9602c
Add vmul_n, vmul_lane, vmulx neon instructions (#1147) 2021-04-30 21:09:41 +01:00
Sparrow Li
07f1d0cae3
Add vmla_n, vmla_lane, vmls_n, vmls_lane neon instructions (#1145) 2021-04-28 22:59:41 +01:00
Sparrow Li
8852d07441
add vcopy neon instructions (#1139) 2021-04-24 01:49:11 +01:00
Christopher Serr
a43f92a181
Add vrndn neon instructions (#1086)
This adds the neon instructions for lane-wise rounding without actually
converting the lanes to integers.
2021-04-22 06:08:40 +01:00
Sparrow Li
de3e8f72c5
Add vqdmul* neon instructions (#1130) 2021-04-21 15:27:08 +01:00
surechen
20c0120362
add neon instruction vaddlv_* (#1129) 2021-04-20 15:19:04 +01:00
Sparrow Li
6354de5993
Add vrshl, vrshr, vrshrn, vrsra, vsra neon instructions (#1127) 2021-04-19 17:49:44 +01:00
surechen
d46e0086e4
add neon instruction vfma_n_* (#1122) 2021-04-17 17:45:54 +01:00
Sparrow Li
e792dfd02c
add vqshl, vqshrn, vqshrun neon instructions (#1120) 2021-04-16 13:22:39 +01:00
Sparrow Li
23f45cc955
Add vqrsh* neon instructions (#1119) 2021-04-15 12:29:04 +01:00
liushuyu
33afae1df7
aarch64: add uzp1, uzp2 instructions (#1118) 2021-04-15 12:21:31 +01:00