Tony Sifkarovski
60c2608cce
[avx2] add _mm_256_cvtepu{8,16,32}_epi{16,32,64} ( #192 )
2017-11-17 09:22:18 +01:00
crypto-universe
1842e36d00
[x86][sse4.1] Add phminposuw & pmul* instructions
...
pmulld is implemented via multiplication.
2017-11-16 07:12:14 -05:00
gnzlbg
955fd849ff
implement missing std::ops
2017-11-13 06:42:49 -05:00
gnzlbg
6ed424a848
syn API breaking change ( #189 )
2017-11-11 23:35:00 +01:00
crypto-universe
bdaea04f2b
[x86][sse4.1] Add pmin* instructions ( #186 )
2017-11-08 23:05:27 -06:00
Caio
545a2a8e2a
Add _mm_unpackhi_pd and _mm_unpacklo_pd ( #184 )
...
* Add _mm_unpackhi_pd and _mm_unpacklo_pd
2017-11-08 11:22:21 +01:00
gnzlbg
20324666f5
[ci] fix formatting and clippy ( #182 )
2017-11-07 09:00:55 -06:00
Malo Jaffré
664395e25e
Fix a confusing typo in a cast name. ( #179 )
2017-11-06 12:45:31 -06:00
André Oliveira
a05fb1b292
Add the necessary SIMD types for sign extend intrinsics
2017-11-06 07:17:27 -05:00
André Oliveira
bab1c7b16a
Avoid using simd_cast directly
2017-11-06 07:17:27 -05:00
André Oliveira
866596cd53
Add _mm_cvtepi16_epi32 and _mm_cvtepi16_epi64 (commented)
2017-11-06 07:17:27 -05:00
André Oliveira
fa240f2477
Add commented implementation of _mm_cvtepi8_epi64
2017-11-06 07:17:27 -05:00
André Oliveira
37396f3471
Add _mm_cvtepi8_epi32
...
- This might be wrong since the cast and the shuffle nedded to be inverted
2017-11-06 07:17:27 -05:00
André Oliveira
f9caf376b2
Add _mm_cvtepi8_epi16
2017-11-06 07:17:27 -05:00
André Oliveira
d6c990967b
Add _mm_packus_epi32 and _mm_cmpeq_epi64 intrinsics
2017-11-06 07:17:27 -05:00
Adam Niederer
a6d9d0c100
Fix mm256_round_epi* return types ( #173 )
...
From the Intel intrinsics manual (emphasis mine):
> Compute the absolute value of packed 16-bit integers in a, and store the
> *unsigned* results in dst.
2017-11-05 20:56:07 -06:00
gwenn
6d4ea09a21
Avx ( #172 )
...
* avx: _mm256_load_pd, _mm256_store_pd, _mm256_load_ps, _mm256_store_ps
* avx: _mm256_load_si256, _mm256_store_si256
2017-11-05 20:55:32 -06:00
Malo Jaffré
74870635e5
Add SSE2 trivial aliases and conversions. ( #165 )
...
`_mm_cvtsd_f64`, `_mm_cvtsd_si64x` and `_mm_cvttsd_si64x`.
See #40 .
2017-11-02 14:10:50 -04:00
gnzlbg
542aac988a
[ci] enable clippy ( #62 )
...
* [ci] enable clippy
* [clippy] fix clippy issues
2017-11-02 13:43:12 -04:00
gwenn
96111d548e
Avx ( #163 )
...
* avx: _mm256_testnzc_si256
* avx: _mm256_shuffle_ps
8 levels of macro expansion takes too long to compile.
* avx: remove useless 0 in tests
* avx: _mm256_shuffle_ps
Macro expansion can be reduced to four levels
* avx: _mm256_blend_ps
Copy/paste from avx2::_mm256_blend_epi32
2017-11-01 08:47:40 -05:00
Alex Crichton
5cb3986530
Bump to 0.0.3
2017-10-30 15:53:07 -07:00
gnzlbg
d6aefaabea
[aarch64] refactor AArch64 intrinsics into its own architecture module ( #162 )
2017-10-29 11:37:43 -05:00
gnzlbg
7f35e50563
[runtime-detection-x86] detect avx and avx2 only if osxsave is true ( #154 )
2017-10-28 16:34:09 -04:00
Mrowqa
0c9ac36595
x86: implemented roundings for SSE4.1 ( #158 )
...
* x86: implemented roundings for SSE4.1
* x86: sse41 roundings - added docs and fixed assert__* tests
2017-10-28 16:32:14 -04:00
gnzlbg
46c6e9beb6
[fmt] use cargo fmt --all ( #161 )
2017-10-28 16:29:52 -04:00
gnzlbg
69d2ad85f3
[ci] check formatting ( #64 )
...
* [ci] check formatting
* [rustfmt] reformat the whole library
2017-10-27 11:55:29 -04:00
Mrowqa
5869eca3e9
x86: implemented _mm{,256}_maskstore_epi{32,64} ( #155 )
...
* x86: implemented maskloads for avx2
* x86: added docs and tests for avx2 maskloads
* x86: refactor - changed `a` to `mem_addr` in avx2 mask loads for consistency
* x86: implemented _mm{,256}_maskstore_epi{32,64}
2017-10-27 11:40:48 -04:00
Henry de Valence
1c67fc00e7
avx2: add _mm256_shuffle_epi32 reusing _mm_shuffle_epi32 code ( #156 )
2017-10-27 11:10:11 -04:00
gnzlbg
ad48780fca
[arm] vadd, vaddd, vaddq, vaddl
2017-10-26 10:18:00 -04:00
Mrowqa
ae0688c7fa
x86: fixed testing equality of floating point numbers ( #150 )
...
* x86: fixed testing equality of floating point numbers
* x86: removed unused macro branch
* x86: marked assert_approx_eq as used only in tests
2017-10-25 09:57:35 -04:00
gwenn
ea51cbcf25
avx: fix *si256 methods ( #145 )
...
* avx: fix *si256 methods
* avx: _mm256_setr_m128
* avx: _mm256_setr_m128d
* avx: _mm256_setr_m128i
* avx: _mm256_loadu2_m128
* avx: _mm256_loadu2_m128d
* avx: _mm256_loadu2_m128i
* avx: _mm256_storeu2_m128
* sse2: _mm_storeu_pd
* avx: _mm256_storeu2_m128d
* sse2: _mm_undefined_si128
* avx: _mm256_storeu2_m128i
* Try to fix i586 build
2017-10-25 01:26:19 -04:00
Henry de Valence
0f33ca5518
avx2: add _mm256_unpack{hi,lo}_epi{8,16,32,64} ( #147 )
2017-10-24 20:12:23 -04:00
gnzlbg
3e1e52f413
update readme and crates.io badges, categories, etc. ( #141 )
...
* [readme] badges
* [crates.io] add badges, categories, etc.
2017-10-23 08:37:41 -05:00
Steven Fackler
6f134c3dfa
Make vector constructors const functions ( #137 )
2017-10-23 08:35:43 -05:00
Thomas Schilling
8b6f5d183e
Add some SSE _mm_cvt* instructions ( #136 )
...
* Add single output _mm_cvt[t]ss_* variants
The *_pi variants are currently blocked by
https://github.com/rust-lang-nursery/stdsimd/issues/74
* Add _mm_cvtsi*_ss
The _mm_cvtpi*_ps intrinsics are blocked by
https://github.com/rust-lang-nursery/stdsimd/issues/74
* Fix Linux builds
Also the si64 variants are only available on x86_64
2017-10-23 08:35:28 -05:00
Steven Fackler
76d9b89ab2
Implement _mm256_permute4x64_epi64 ( #144 )
2017-10-23 08:35:03 -05:00
gnzlbg
1f44e3166e
Deny all warnings and fix errors ( #135 )
...
* [travis-ci] deny warnings
* fix all warnings
2017-10-22 12:30:26 -05:00
gnzlbg
8fa5e7bcf5
[travis-ci] allow testing on all branches ( #134 )
2017-10-22 07:43:48 -05:00
jneem
192c4ac4fd
avx2: signed extensions ( #132 )
...
_mm256_cvtepi8_epi16
_mm256_cvtepi8_epi32
_mm256_cvtepi8_epi64
_mm256_cvtepi16_epi32
_mm256_cvtepi16_epi64
_mm256_cvtepi32_epi64
2017-10-21 15:00:13 -05:00
Steven Fackler
5fb563aabc
Add _mm256_shuffle_epi8 and _mm256_permutevar8x32_epi32 ( #133 )
...
* Add _mm256_shuffle_epi8
* Add _mm256_permutevar8x32_epi32
2017-10-21 14:59:37 -05:00
pythoneer
d5fd2b09a7
sse2 ( #131 )
...
* added missing doc _mm_cvtps_pd
added missing doc & test _mm_load_pd
added missing doc & test _mm_store_pd
added _mm_store1_pd
added _mm_store_pd1
added _mm_storer_pd
added _mm_load_pd1
added _mm_loadr_pd
added _mm_loadu_pd
* correct alignments
2017-10-21 10:46:55 -05:00
jneem
3ec870078a
avx2: _mm256_blend_epi32 and _mm256_blend_epi16. ( #130 )
2017-10-18 17:29:23 -05:00
gnzlbg
a3a703d83e
[example] nbody ( #117 )
2017-10-18 17:19:19 -05:00
Dan Robertson
4782ffadee
[x86] Implement avx2 broadcast intrinsics ( #97 )
...
Implement
- _mm_broadcastb_epi8
- _mm256_broadcastb_epi8
- _mm_broadcastd_epi32
- _mm256_broadcastd_epi32
- _mm_bradcastq_epi64
- _mm256_broadcastq_epi64
- _mm_broadcastsd_pd
- _mm256_broadcastsd_pd
- _mm256_broadcastsi128_si256
- _mm_broadcastss_ps
- _mm256_broadcastss_ps
- _mm_broadcastw_epi16
- _mm256_broadcast2_epi16
2017-10-18 14:36:17 -05:00
Alex Crichton
7b249298c0
Uncomment _mm256_mpsadbw_epu8 ( #128 )
...
Just needed some `constify_imm8!` treatment
Closes #59
2017-10-18 13:17:09 -05:00
gnzlbg
2dc965b69a
[neon] reciprocal square-root estimate ( #121 )
2017-10-18 13:16:34 -05:00
Alex Crichton
13bc6b8517
Add CI in Intel's instruction emulator ( #113 )
...
This commit adds a new builder on CI for running tests in Intel's own emulator
and also adds an assertion that on this emulator no tests are skipped due to
missing CPU features by accident.
Closes #92
2017-10-18 11:35:11 -04:00
André Oliveira
02c89b24ba
sse4.1 instructions ( #98 )
...
* sse4.1: _mm_blendv_ps and _mm_blendv_pd
* sse4.1: _mm_blend_ps and _mm_blend_pd
- HACK warning: messing with the constify macros
- Selecting only one buffer gets optimized away and tests need to take this into account
* sse4.1: _mm_blend_epi16
* sse4.1: _mm_extract_ps
* sse4.1: _mm_extract_epi8
* see4.1: _mm_extract_epi32
* sse4.1: _mm_extract_epi64
* sse4.1: _mm_insert_ps
* sse4.1: _mm_insert_epi8
* sse4.1: _mm_insert_epi32 and _mm_insert_epi64
* Formmating
* sse4.1: _mm_max_epi8, _mm_max_epu16, _mm_max_epi32 and _mm_max_epu32
* Fix wrong compiler flag
- avx -> sse4.1
* Fix intrinsics that only work with x86-64
* sse4.1: use appropriate types
* Revert '_mm_extract_ps' to return i32
* sse4.1: Use the v128 types for consistency
* Try fix for windows
* Try "vectorcall" calling convention
* Revert "Try "vectorcall" calling convention"
This reverts commit 12936e9976bc6b0e4e538d82f55f0ee2d87a7f25.
* Revert "Try fix for windows"
This reverts commit 9c473808d334acedd46060b32ceea116662bf6a3.
* Change tests for windows
* Remove useless Windows test
2017-10-18 11:34:51 -04:00
jneem
acf919f960
avx2: _mm_blend_epi32 ( #127 )
2017-10-17 10:16:15 -04:00
Thomas Schilling
64c7f7ec56
Add SSE _mm_store* intrinsics and _mm_move_ss ( #115 )
...
* Add _mm_store* intrinsics and _mm_move_ss
* Fix Win64 & Linux i586 failures
* Make i586 codegen happy without breaking x86_64
2017-10-17 10:15:37 -04:00