Make things more consistent with other API that works with a bitwise
representation of the exponent. That is, use `u32` when working with a
bitwise (biased) representation, use `i32` when the bitwise
representation has been adjusted for bias and ay be negative.
Every place this has been used so far has an `as i32`, so this change
makes things cleaner anyway.
Currently our XFAILs are open ended; we do not check that it actually
fails, so we have no easy way of knowing that a previously-failing test
starts passing. Introduce a new enum that we return from overrides to
give us more flexibility here, including the ability to assert that
expected failures happen.
With the new enum, it is also possible to specify ULP via return value
rather than passing a `&mut u32` parameter.
This includes refactoring of `precision.rs` to be more accurate about
where errors come from, if possible.
Fixes: https://github.com/rust-lang/libm/issues/455
Additionally, make use of this version to implement `ceil` and `ceilf`.
Musl's `ceilf` algorithm seems to work better for all versions of the
functions. Testing with a generic version of musl's `ceil` routine
showed the following regressions:
icount::icount_bench_ceil_group::icount_bench_ceil logspace:setup_ceil()
Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
Baselines: softfloat|softfloat
Instructions: 14064|13171 (+6.78005%) [+1.06780x]
L1 Hits: 16697|15803 (+5.65715%) [+1.05657x]
L2 Hits: 0|0 (No change)
RAM Hits: 7|8 (-12.5000%) [-1.14286x]
Total read+write: 16704|15811 (+5.64797%) [+1.05648x]
Estimated Cycles: 16942|16083 (+5.34104%) [+1.05341x]
icount::icount_bench_ceilf_group::icount_bench_ceilf logspace:setup_ceilf()
Performance has regressed: Instructions (14732 > 9901) regressed by +48.7931% (>+5.00000)
Baselines: softfloat|softfloat
Instructions: 14732|9901 (+48.7931%) [+1.48793x]
L1 Hits: 17494|12611 (+38.7202%) [+1.38720x]
L2 Hits: 0|0 (No change)
RAM Hits: 6|6 (No change)
Total read+write: 17500|12617 (+38.7018%) [+1.38702x]
Estimated Cycles: 17704|12821 (+38.0860%) [+1.38086x]
`exp` does not perform any form of unbiasing, so there isn't any reason
it should be signed. Change this.
Additionally, add `EPSILON` to the `Float` trait.
Musl commit 97e9b73d59 ("math: new software sqrt") adds a new algorithm
using Goldschmidt division. Port this algorithm to Rust and make it
generic, which shows a notable performance improvement over the existing
algorithm.
This also allows adding square root routines for `f16` and `f128`.
Any architecture-specific float operations are likely to consist of only
a few instructions, but the softfloat implementations are much more
complex. Ensure this is what gets tested.
`cc` automatically reads this from Cargo's `OPT_LEVEL` variable so we
don't need to set it explicitly. Remove this so running in a debugger
makes more sense.
Introduce a way to ignore the results of icount regression tests, by
specifying `allow-regressions` in the pull request body. This should
apply to both pull requests and the merges based on them, since `gh pr
view` automatically handles both.
These benchmarks are fast to run, so the time cost here is pretty
minimal. Running softfloat benchmarks just ensures that we don't e.g.
test the performance of `_mm_sqrt_ss` rather than our implementation,
and running without softfloat gives us a way to see the effect of arch
intrinsics.
`--limit=1` seems to apply before `jq` filtering, meaning our
`WORKFLOW_NAME` ("CI") workflow may not appear in the input to the jq
query. Removing `--limit` provides a default amount of inputs that jq
can then filter from, so this works better.
This failed a couple of times recently in CI, once on i686 and once on
aarch64-apple:
thread 'main' panicked at crates/libm-test/benches/random.rs:76:65:
called `Result::unwrap()` on an `Err` value: ynf
Caused by:
0:
input: (681, 509.90924) (0x000002a9, 0x43fef462)
expected: -3.2161271e38 0xff71f45b
actual: -inf 0xff800000
1: mismatched infinities
thread 'main' panicked at crates/libm-test/benches/random.rs:76:65:
called `Result::unwrap()` on an `Err` value: ynf
Caused by:
0:
input: (132, 50.46604) (0x00000084, 0x4249dd3a)
expected: -3.3364996e38 0xff7b02a5
actual: -inf 0xff800000
1: mismatched infinities
Add a new override to account for this.
The icount benchmarks are what we will be relying on in CI more than the
existing benchmarks. There isn't much reason to keep these around, but
there isn't much point in dropping them either. So, just reduce the
runtime.
Add support in `ci-util.py` for finding the most recent baseline and
downloading it, which new tests can then be compared against.
Arbitrarily select nightly-2025-01-16 as the rustc version to pin to in
benchmarks.
Running walltime benchmarks in CI is notoriously unstable, Introduce
benchmarks that instead use instruction count and other more
reproducible metrics, using `iai-callgrind` [1], which we are able to
run in CI with a high degree of reproducibility.
Inputs to this benchmark are a logspace sweep, which gives an
approximation for real-world use, but may fail to indicate outlier
cases.
[1]: https://github.com/iai-callgrind/iai-callgrind
This also allows reusing the same generator logic between logspace tests
and extensive tests, so comes with a nice bit of cleanup.
Changes:
* Make the generator part of `CheckCtx` since a `Generator` and
`CheckCtx` are almost always passed together.
* Rename `domain_logspace` to `spaced` since this no longer only
operates within a domain and we may want to handle integer spacing.
* Domain is now calculated at runtime rather than using traits, which is
much easier to work with.
* With the above, domains for multidimensional functions are added.
* The extensive test generator code tests has been combined with the
domain_logspace generator code. With this, the domain tests have just
become a subset of extensive tests. These were renamed to "quickspace"
since, technically, the extensive tests are also "domain" or "domain
logspace" tests.
* Edge case generators now handle functions with multiple inputs.
* The test runners can be significantly cleaned up and deduplicated.
The test suite for this repo has quite a lot of tests, and it is
difficult to tell which contribute the most to the long CI runtime.
libtest does have an unstable flag to report test times, but that is
inconvenient to use because it needs to be passed only to libtest
binaries.
Switch to cargo-nextest [1] which provides time reporting and, overall,
a better test UI. It may also improve test runtime, though this seems
unlikely since we have larger test binaries with many small tests
(nextest benefits the most when there are larger binaries that can be
run in parallel).
For anyone running locally without, `run.sh` should still fall back to
`cargo test` if `cargo-nextest` is not available.
This diff includes some cleanup and consistency changes to other
CI-related files.
[1]: https://nexte.st