659 Commits

Author SHA1 Message Date
Trevor Gross
f94df5d524 Add a generic version of fmin and fmax
These can be used for `fmin`, `fminf`, `fmax`, and `fmaxf`. No changes
to the implementation are made, so [1] is not fixed.

[1]: https://github.com/rust-lang/libm/issues/439
2025-01-24 02:58:00 +00:00
Trevor Gross
357ee34abb Remove an outdated note about precision 2025-01-24 01:59:10 +00:00
Trevor Gross
d20a5e82a5 Add roundf16 and roundf128 2025-01-24 01:59:10 +00:00
Trevor Gross
cdbe65b503 Add a generic version of round
This replaces `round` and `roundf`.
2025-01-24 01:49:23 +00:00
Trevor Gross
3aa2d1cfc2 Add a generic version of scalbn
This replaces the `f32` and `f64` versions of `scalbn` and `ldexp`.
2025-01-23 16:59:44 -06:00
Trevor Gross
b8da1919f9 Change from_parts to take a u32 exponent rather than i32
Make things more consistent with other API that works with a bitwise
representation of the exponent. That is, use `u32` when working with a
bitwise (biased) representation, use `i32` when the bitwise
representation has been adjusted for bias and ay be negative.

Every place this has been used so far has an `as i32`, so this change
makes things cleaner anyway.
2025-01-23 03:46:46 -06:00
Trevor Gross
ca8dccc5b6 Introduce XFAILs that assert failure
Currently our XFAILs are open ended; we do not check that it actually
fails, so we have no easy way of knowing that a previously-failing test
starts passing. Introduce a new enum that we return from overrides to
give us more flexibility here, including the ability to assert that
expected failures happen.

With the new enum, it is also possible to specify ULP via return value
rather than passing a `&mut u32` parameter.

This includes refactoring of `precision.rs` to be more accurate about
where errors come from, if possible.

Fixes: https://github.com/rust-lang/libm/issues/455
2025-01-23 01:15:21 -06:00
Trevor Gross
8dc4ef6f0f Add hf16! and hf128!
Expand the existing hex float functions and macros with versions that
work with `f16` and `f128`.
2025-01-22 21:57:23 -06:00
Trevor Gross
42e22132b4 Fix the parsing of three-item tuples in util 2025-01-22 18:28:57 -05:00
Trevor Gross
c788ced502 Add the ability to parse hex, binary, and float hex with util 2025-01-22 17:03:34 -05:00
Trevor Gross
b22398d658 Add rintf16 and rintf128
Use the generic algorithms to provide implementations for these
routines.
2025-01-22 11:04:39 +00:00
Trevor Gross
bf2d171b1a Add a generic version of rint
Use this to implement `rint` and `rintf`.
2025-01-22 11:04:36 +00:00
Trevor Gross
3ae70a4a6c Adjust ceil style to be more similar to floor 2025-01-22 08:54:21 +00:00
Trevor Gross
6a8bb0fa80 Add floorf16 and floorf128
Use the generic algorithms to provide implementations for these
routines.
2025-01-22 08:50:06 +00:00
Trevor Gross
42fce292ab Add a generic version of floor
Additionally, make use of this version to implement `floor` and
`floorf`.

Similar to `ceil`, musl'f `ceilf` routine seems to work better for all
float widths than the `ceil` algorithm. Trying with the `ceil` (`f64`)
algorithm produced the following regressions:

    icount::icount_bench_floor_group::icount_bench_floor logspace:setup_floor()
    Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14064|13171                (+6.78005%) [+1.06780x]
      L1 Hits:                            16821|15802                (+6.44855%) [+1.06449x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
      Total read+write:                   16829|15811                (+6.43856%) [+1.06439x]
      Estimated Cycles:                   17101|16117                (+6.10535%) [+1.06105x]
    icount::icount_bench_floorf128_group::icount_bench_floorf128 logspace:setup_floorf128()
      Baselines:                      softfloat|softfloat
      Instructions:                      166868|N/A                  (*********)
      L1 Hits:                           221429|N/A                  (*********)
      L2 Hits:                                1|N/A                  (*********)
      RAM Hits:                              34|N/A                  (*********)
      Total read+write:                  221464|N/A                  (*********)
      Estimated Cycles:                  222624|N/A                  (*********)
    icount::icount_bench_floorf16_group::icount_bench_floorf16 logspace:setup_floorf16()
      Baselines:                      softfloat|softfloat
      Instructions:                      143029|N/A                  (*********)
      L1 Hits:                           176517|N/A                  (*********)
      L2 Hits:                                1|N/A                  (*********)
      RAM Hits:                              13|N/A                  (*********)
      Total read+write:                  176531|N/A                  (*********)
      Estimated Cycles:                  176977|N/A                  (*********)
    icount::icount_bench_floorf_group::icount_bench_floorf logspace:setup_floorf()
    Performance has regressed: Instructions (14732 > 10441) regressed by +41.0976% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14732|10441                (+41.0976%) [+1.41098x]
      L1 Hits:                            17616|13027                (+35.2268%) [+1.35227x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               8|6                    (+33.3333%) [+1.33333x]
      Total read+write:                   17624|13033                (+35.2260%) [+1.35226x]
      Estimated Cycles:                   17896|13237                (+35.1968%) [+1.35197x]
2025-01-22 08:48:11 +00:00
Trevor Gross
9064c42abe Add ceilf16 and ceilf128
Use the generic algorithms to provide implementations for these
routines.
2025-01-22 07:22:32 +00:00
Trevor Gross
c00f119166 Add a generic version of ceil
Additionally, make use of this version to implement `ceil` and `ceilf`.

Musl's `ceilf` algorithm seems to work better for all versions of the
functions. Testing with a generic version of musl's `ceil` routine
showed the following regressions:

    icount::icount_bench_ceil_group::icount_bench_ceil logspace:setup_ceil()
    Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14064|13171                (+6.78005%) [+1.06780x]
      L1 Hits:                            16697|15803                (+5.65715%) [+1.05657x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               7|8                    (-12.5000%) [-1.14286x]
      Total read+write:                   16704|15811                (+5.64797%) [+1.05648x]
      Estimated Cycles:                   16942|16083                (+5.34104%) [+1.05341x]
    icount::icount_bench_ceilf_group::icount_bench_ceilf logspace:setup_ceilf()
    Performance has regressed: Instructions (14732 > 9901) regressed by +48.7931% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14732|9901                 (+48.7931%) [+1.48793x]
      L1 Hits:                            17494|12611                (+38.7202%) [+1.38720x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               6|6                    (No change)
      Total read+write:                   17500|12617                (+38.7018%) [+1.38702x]
      Estimated Cycles:                   17704|12821                (+38.0860%) [+1.38086x]
2025-01-22 07:22:32 +00:00
Trevor Gross
a7cd13b6a3 Make Float::exp return an unsigned integer
`exp` does not perform any form of unbiasing, so there isn't any reason
it should be signed. Change this.

Additionally, add `EPSILON` to the `Float` trait.
2025-01-22 07:15:39 +00:00
Trevor Gross
5ac2f99954 Shift then mask, rather than mask then shift
This may allow for small optimizations with larger float types since
`u32` math can be used after shifting. LLVM may be already getting this
anyway.
2025-01-22 07:09:37 +00:00
Trevor Gross
186eac9227 Add sqrtf16 and sqrtf128
Use the generic algorithms to provide implementations for these
routines.
2025-01-22 05:31:13 +00:00
Trevor Gross
03041a0371 Copy the u256 implementation from compiler_builtins 2025-01-22 05:31:13 +00:00
Trevor Gross
573ded2ee8 Port the most recent version of Musl's sqrt as a generic algorithm
Musl commit 97e9b73d59 ("math: new software sqrt") adds a new algorithm
using Goldschmidt division. Port this algorithm to Rust and make it
generic, which shows a notable performance improvement over the existing
algorithm.

This also allows adding square root routines for `f16` and `f128`.
2025-01-22 05:31:13 +00:00
Trevor Gross
8927014e91 Enable force-soft-floats for extensive tests
Any architecture-specific float operations are likely to consist of only
a few instructions, but the softfloat implementations are much more
complex. Ensure this is what gets tested.
2025-01-22 05:31:13 +00:00
Trevor Gross
9c98c46147 Don't set opt_level in the musl build script
`cc` automatically reads this from Cargo's `OPT_LEVEL` variable so we
don't need to set it explicitly. Remove this so running in a debugger
makes more sense.
2025-01-22 05:22:12 +00:00
Trevor Gross
6ac9c14933 Add a retry to the musl download
This download has occasionally been failing in CI recently. Add a retry
so this is less likely to cause the workflow to fail.
2025-01-21 22:02:48 -05:00
Trevor Gross
b3d57f8c28 Remove trailing whitespace in scripts, run JuliaFormatter 2025-01-21 20:30:11 -05:00
Trevor Gross
e21618c73e Ignore files relevant to benchmarking 2025-01-21 07:58:05 +00:00
Trevor Gross
d3328a0dab Add a way to ignore benchmark regression checks
Introduce a way to ignore the results of icount regression tests, by
specifying `allow-regressions` in the pull request body. This should
apply to both pull requests and the merges based on them, since `gh pr
view` automatically handles both.
2025-01-21 07:58:05 +00:00
Trevor Gross
c5dc1b8ca0 Run wall time benchmarks with --features force-soft-floats
Similar to changes for `icount` benchmarks, this ensures we aren't
testing the throughput of architecture instructions.
2025-01-21 07:58:05 +00:00
Trevor Gross
ba0cfe58dd Run icount benchmarks once with softfloat and once with hardfloat
These benchmarks are fast to run, so the time cost here is pretty
minimal. Running softfloat benchmarks just ensures that we don't e.g.
test the performance of `_mm_sqrt_ss` rather than our implementation,
and running without softfloat gives us a way to see the effect of arch
intrinsics.
2025-01-21 07:58:05 +00:00
Trevor Gross
f9041943f1 Switch to the arm-linux runner and enable MPFR
The free arm64 Linux runners are now available [1]. Switch to using this
image in CI, and enable tests against MPFR since this is now a native
platform.

[1]: https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/
2025-01-20 16:28:39 -05:00
Trevor Gross
f39af6cb97 Remove the limit for querying a baseline
`--limit=1` seems to apply before `jq` filtering, meaning our
`WORKFLOW_NAME` ("CI") workflow may not appear in the input to the jq
query. Removing `--limit` provides a default amount of inputs that jq
can then filter from, so this works better.
2025-01-16 15:24:37 -06:00
Trevor Gross
3986206ce0 Add an xfail for recent ynf failures
This failed a couple of times recently in CI, once on i686 and once on
aarch64-apple:

    thread 'main' panicked at crates/libm-test/benches/random.rs:76:65:
    called `Result::unwrap()` on an `Err` value: ynf

    Caused by:
        0:
               input:    (681, 509.90924) (0x000002a9, 0x43fef462)
               expected: -3.2161271e38          0xff71f45b
               actual:   -inf                   0xff800000
        1: mismatched infinities

    thread 'main' panicked at crates/libm-test/benches/random.rs:76:65:
    called `Result::unwrap()` on an `Err` value: ynf

    Caused by:
        0:
               input:    (132, 50.46604) (0x00000084, 0x4249dd3a)
               expected: -3.3364996e38          0xff7b02a5
               actual:   -inf                   0xff800000
        1: mismatched infinities

Add a new override to account for this.
2025-01-16 09:47:00 +00:00
Trevor Gross
5139ba6f46 Reduce the warm up and measurement time for short-benchmarks
The icount benchmarks are what we will be relying on in CI more than the
existing benchmarks. There isn't much reason to keep these around, but
there isn't much point in dropping them either. So, just reduce the
runtime.
2025-01-16 09:07:46 +00:00
Trevor Gross
cdb1e680e0 Run iai-callgrind benchmarks in CI
Add support in `ci-util.py` for finding the most recent baseline and
downloading it, which new tests can then be compared against.

Arbitrarily select nightly-2025-01-16 as the rustc version to pin to in
benchmarks.
2025-01-16 09:07:46 +00:00
Trevor Gross
490ebbb187 Add benchmarks using iai-callgrind
Running walltime benchmarks in CI is notoriously unstable, Introduce
benchmarks that instead use instruction count and other more
reproducible metrics, using `iai-callgrind` [1], which we are able to
run in CI with a high degree of reproducibility.

Inputs to this benchmark are a logspace sweep, which gives an
approximation for real-world use, but may fail to indicate outlier
cases.

[1]: https://github.com/iai-callgrind/iai-callgrind
2025-01-16 09:07:19 +00:00
Trevor Gross
f56b41dbbd Provide a way to override iteration count
Benchmarks need a way to limit how many iterations get run. Introuce a
way to inject this information here.
2025-01-16 08:53:50 +00:00
Trevor Gross
17c86e4e7f Increase the CI timeout 2025-01-16 01:10:26 +00:00
Trevor Gross
ecca4879a2 Adjust precision and add xfails based on new tests 2025-01-16 01:10:26 +00:00
Trevor Gross
2d857e1c21 Replace HasDomain to enable multi-argument edge case and domain tests
This also allows reusing the same generator logic between logspace tests
and extensive tests, so comes with a nice bit of cleanup.

Changes:

* Make the generator part of `CheckCtx` since a `Generator` and
  `CheckCtx` are almost always passed together.
* Rename `domain_logspace` to `spaced` since this no longer only
  operates within a domain and we may want to handle integer spacing.
* Domain is now calculated at runtime rather than using traits, which is
  much easier to work with.
* With the above, domains for multidimensional functions are added.
* The extensive test generator code tests has been combined with the
  domain_logspace generator code. With this, the domain tests have just
  become a subset of extensive tests. These were renamed to "quickspace"
  since, technically, the extensive tests are also "domain" or "domain
  logspace" tests.
* Edge case generators now handle functions with multiple inputs.
* The test runners can be significantly cleaned up and deduplicated.
2025-01-16 01:10:26 +00:00
Trevor Gross
45e3b98165 Add an override for a recent failure
Failed on i686:

    ──── STDERR:             libm-test::bench/random y1f/crate

    thread 'main' panicked at crates/libm-test/benches/random.rs:76:65:
    called `Result::unwrap()` on an `Err` value: ynf

    Caused by:
        0:
               input:    (213, 109.15641) (0x000000d5, 0x42da5015)
               expected: -3.3049217e38          0xff78a27a
               actual:   -inf                   0xff800000
        1: mismatched infinities
2025-01-15 01:05:38 +00:00
Trevor Gross
b251f74843 Pass --max-fail to nextest so it doesn't fail fast 2025-01-15 00:57:23 +00:00
Trevor Gross
f63ef37218 Slightly restructure ci/calculate-exhaustive-matrix.py
Change this script into a generic CI utility that we will be able to
expand in the future.
2025-01-15 00:57:23 +00:00
Trevor Gross
5e65179a39 Change .yml files to the canonical extension .yaml 2025-01-14 03:24:14 +00:00
Trevor Gross
26df5d6689 Use cargo-nextest for running tests in CI
The test suite for this repo has quite a lot of tests, and it is
difficult to tell which contribute the most to the long CI runtime.
libtest does have an unstable flag to report test times, but that is
inconvenient to use because it needs to be passed only to libtest
binaries.

Switch to cargo-nextest [1] which provides time reporting and, overall,
a better test UI. It may also improve test runtime, though this seems
unlikely since we have larger test binaries with many small tests
(nextest benefits the most when there are larger binaries that can be
run in parallel).

For anyone running locally without, `run.sh` should still fall back to
`cargo test` if `cargo-nextest` is not available.

This diff includes some cleanup and consistency changes to other
CI-related files.

[1]: https://nexte.st
2025-01-13 21:32:54 -05:00
quaternic
bfbe919adf Simplify and optimize fdim (#442)
The cases with NaN arguments can be handled by the same x - y
expression, and this generates much better code: https://godbolt.org/z/f3rnT8jx4.
2025-01-14 01:55:26 +00:00
Trevor Gross
bcd9d8a5c3 Reduce indentation in run.sh using early return 2025-01-13 23:01:48 +00:00
Trevor Gross
fd7a45f7f6 Don't set codegen-units=1 by default in CI
We can set this only for the release profile, there isn't any reason to
have it set for debug tests.
2025-01-13 23:01:44 +00:00
Trevor Gross
13b5bf3959 Add fdimf16 and fdimf128
Use the generic algorithms to provide implementations for these
routines.
2025-01-13 14:04:54 +00:00
Trevor Gross
0f285df716 Add a generic version of fdim 2025-01-13 13:49:46 +00:00