bors c1132470a6 Auto merge of #130733 - okaneco:is_ascii, r=scottmcm
Optimize `is_ascii` for `str` and `[u8]` further

Replace the existing optimized function with one that enables auto-vectorization.

This is especially beneficial on x86-64 as `pmovmskb` can be emitted with careful structuring of the code. The instruction can detect non-ASCII characters one vector register width at a time instead of the current `usize` at a time check.

The resulting implementation is completely safe.

`case00_libcore` is the current implementation, `case04_while_loop` is this PR.
```
benchmarks:
    ascii::is_ascii_slice::long::case00_libcore                             22.25/iter  +/- 1.09
    ascii::is_ascii_slice::long::case04_while_loop                           6.78/iter  +/- 0.92
    ascii::is_ascii_slice::medium::case00_libcore                            2.81/iter  +/- 0.39
    ascii::is_ascii_slice::medium::case04_while_loop                         1.56/iter  +/- 0.78
    ascii::is_ascii_slice::short::case00_libcore                             5.55/iter  +/- 0.85
    ascii::is_ascii_slice::short::case04_while_loop                          3.75/iter  +/- 0.22
    ascii::is_ascii_slice::unaligned_both_long::case00_libcore              26.59/iter  +/- 0.66
    ascii::is_ascii_slice::unaligned_both_long::case04_while_loop            5.78/iter  +/- 0.16
    ascii::is_ascii_slice::unaligned_both_medium::case00_libcore             2.97/iter  +/- 0.32
    ascii::is_ascii_slice::unaligned_both_medium::case04_while_loop          2.41/iter  +/- 0.10
    ascii::is_ascii_slice::unaligned_head_long::case00_libcore              23.71/iter  +/- 0.79
    ascii::is_ascii_slice::unaligned_head_long::case04_while_loop            7.83/iter  +/- 1.31
    ascii::is_ascii_slice::unaligned_head_medium::case00_libcore             3.69/iter  +/- 0.54
    ascii::is_ascii_slice::unaligned_head_medium::case04_while_loop          7.05/iter  +/- 0.32
    ascii::is_ascii_slice::unaligned_tail_long::case00_libcore              24.44/iter  +/- 1.41
    ascii::is_ascii_slice::unaligned_tail_long::case04_while_loop            5.12/iter  +/- 0.18
    ascii::is_ascii_slice::unaligned_tail_medium::case00_libcore             3.24/iter  +/- 0.40
    ascii::is_ascii_slice::unaligned_tail_medium::case04_while_loop          2.86/iter  +/- 0.14

```

`unaligned_head_medium` is the main regression in the benchmarks. It is a 32 byte string being sliced `bytes[1..]`.

The first commit can be used to run the benchmarks against the current core implementation.

Previous implementation was done in #74066

---

Two potential drawbacks of this implementation are that it increases instruction count and may regress other platforms/architectures. The benches here may also be too artificial to glean much insight from.
https://rust.godbolt.org/z/G9znGfY36
2024-12-22 02:44:13 +00:00
..
2024-05-31 15:56:43 +10:00
2024-10-06 18:12:25 +02:00
2024-05-31 15:56:43 +10:00
2024-11-17 21:49:10 +01:00
2024-05-31 15:56:43 +10:00
2024-09-09 19:39:43 -07:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-09-09 19:39:43 -07:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-04-11 21:42:35 -04:00
2024-04-22 18:48:47 +02:00
2024-04-11 21:42:35 -04:00
2024-05-31 15:56:43 +10:00
2024-08-29 18:12:31 +08:00
2024-11-17 21:49:10 +01:00
2024-05-31 15:56:43 +10:00
2024-04-24 13:12:33 +01:00
2024-09-09 19:39:43 -07:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-06-16 17:19:25 +08:00
2024-05-31 15:56:43 +10:00
2024-05-03 14:32:08 +02:00
2024-05-31 15:56:43 +10:00
2024-07-14 13:48:29 +03:00
2024-05-31 15:56:43 +10:00
2024-04-22 18:48:47 +02:00
2024-07-14 13:48:29 +03:00
2024-12-10 21:41:05 +01:00
2024-05-31 15:56:43 +10:00
2024-07-14 13:48:29 +03:00
2024-05-31 15:56:43 +10:00
2024-10-22 02:25:38 -07:00
2024-05-31 15:56:43 +10:00
2024-08-07 00:41:48 -04:00
2024-05-31 15:56:43 +10:00
2024-05-31 15:56:43 +10:00
2024-06-19 21:26:48 +01:00
2024-05-31 15:56:43 +10:00
2024-09-09 19:39:43 -07:00

The files here use the LLVM FileCheck framework, documented at https://llvm.org/docs/CommandGuide/FileCheck.html.

One extension worth noting is the use of revisions as custom prefixes for FileCheck. If your codegen test has different behavior based on the chosen target or different compiler flags that you want to exercise, you can use a revisions annotation, like so:

// revisions: aaa bbb
// [bbb] compile-flags: --flags-for-bbb

After specifying those variations, you can write different expected, or explicitly unexpected output by using <prefix>-SAME: and <prefix>-NOT:, like so:

// CHECK: expected code
// aaa-SAME: emitted-only-for-aaa
// aaa-NOT:                        emitted-only-for-bbb
// bbb-NOT:  emitted-only-for-aaa
// bbb-SAME:                       emitted-only-for-bbb