Add alignment parameter to `simd_masked_{load,store}`
This PR adds an alignment parameter in `simd_masked_load` and `simd_masked_store`, in the form of a const-generic enum `core::intrinsics::simd::SimdAlign`. This represents the alignment of the `ptr` argument in these intrinsics as follows
- `SimdAlign::Unaligned` - `ptr` is unaligned/1-byte aligned
- `SimdAlign::Element` - `ptr` is aligned to the element type of the SIMD vector (default behavior in the old signature)
- `SimdAlign::Vector` - `ptr` is aligned to the SIMD vector type
The main motive for this is stdarch - most vector loads are either fully aligned (to the vector size) or unaligned (byte-aligned), so the previous signature doesn't cut it.
Now, stdarch will mostly use `SimdAlign::Unaligned` and `SimdAlign::Vector`, whereas portable-simd will use `SimdAlign::Element`.
- [x] `cg_llvm`
- [x] `cg_clif`
- [x] `miri`/`const_eval`
## Alternatives
Using a const-generic/"const" `u32` parameter as alignment (and we error during codegen if this argument is not a power of two). This, although more flexible than this, has a few drawbacks
- If we use an const-generic argument, then portable-simd somehow needs to pass `align_of::<T>()` as the alignment, which isn't possible without GCE
- "const" function parameters are just an ugly hack, and a pain to deal with in non-LLVM backends
We can remedy the problem with the const-generic `u32` parameter by adding a special rule for the element alignment case (e.g. `0` can mean "use the alignment of the element type), but I feel like this is not as expressive as the enum approach, although I am open to suggestions
cc `@workingjubilee` `@RalfJung` `@BoxyUwU`
Stabilize -Zno-jump-tables into -Cjump-tables=bool
I propose stabilizing the -Zno-jump-tables option into -Cjump-tables=<bool>.
# `-Zno-jump-tables` stabilization report
## What is the RFC for this feature and what changes have occurred to the user-facing design since the RFC was finalized?
No RFC was created for this option. This was a narrowly scoped option introduced in rust-lang/rust#105812 to support code generation requirements of the x86-64 linux kernel, and eventually other targets as Rust For Linux grows.
The tracking is rust-lang/rust#116592.
## What behavior are we committing to that has been controversial? Summarize the major arguments pro/con.
The behavior of this flag is well defined, and mimics the existing `-fno-jump-tables` option currently available with LLVM and GCC with some caveats:
* Unlike clang or gcc, this option may be ignored by the code generation backend. Rust can support multiple code-generation backends. For stabilization, only the LLVM backend honors this option.
* The usage of this option will not guarantee a library or binary is free of jump tables. To ensure a jump-table free binary, all crates in the build graph must be compiled with this option. This includes implicitly linked crates such as std or core.
* This option only enforces the crate being compiled is free of jump tables.
* No verification is done to ensure other crates are compiled with this option. Enforcing code generation options are applied across the crate graph is out of scope for this option.
What should the flag name be?
* As introduced, this option was named `-Zno-jump-tables`. However, other major toolchains allow both positive and negative variants of this option to toggle this feature. Renaming the option to `-Cjump-tables=<bool>` makes this option consistent, and if for some reason, expandable to other arguments in the future. Notably, many LLVM targets have a configurable and different thresholds for when to lower into a jump table.
## Are there extensions to this feature that remain unstable? How do we know that we are not accidentally committing to those.
No. This option is used exclusively to gate a very specific class of optimization.
## Summarize the major parts of the implementation and provide links into the code (or to PRs)
* The original PR rust-lang/rust#105812 by ```@ojeda```
* The stabilized CLI option is parsed as a bool:
68bfda9025/compiler/rustc_session/src/options.rs (L2025-L2026)
* This options adds an attribute to each llvm function via:
68bfda9025/compiler/rustc_codegen_llvm/src/attributes.rs (L210-L215)
* Finally, the rustc book is updated with the new option:
68bfda9025/src/doc/rustc/src/codegen-options/index.md (L212-L223)
## Has a call-for-testing period been conducted? If so, what feedback was received?
No. The option has originally created is being used by Rust For Linux to build the x86-64 kernel without issue.
## What outstanding bugs in the issue tracker involve this feature? Are they stabilization-blocking?
There are no outstanding issues.
## Summarize contributors to the feature by name for recognition and assuredness that people involved in the feature agree with stabilization
* ```@ojeda``` implemented this feature in rust-lang/rust#105815 as `-Zno-jump-tables`.
* ```@tgross35``` created and maintained the tracking issue rust-lang/rust#116592, and provided feedback about the naming of the cli option.
## What FIXMEs are still in the code for that feature and why is it ok to leave them there?
There are none.
## What static checks are done that are needed to prevent undefined behavior?
This option cannot cause undefined behavior. It is a boolean option with well defined behavior in both cases.
## In what way does this feature interact with the reference/specification, and are those edits prepared?
This adds a new cli option to `rustc`. The documentation is updated, and the unstable documentation cleaned up in this PR.
## Does this feature introduce new expressions and can they produce temporaries? What are the lifetimes of those temporaries?
No.
## What other unstable features may be exposed by this feature?
None.
## What is tooling support like for this feature, w.r.t rustdoc, clippy, rust-analzyer, rustfmt, etc.?
No support is required from other rust tooling.
## Open Items
- [x] Are there objections renaming `-Zno-jump-tables` to `-Cjump-tables=<bool>`? The consensus is no.
- [x] Is it desirable to keep `-Zno-jump-tables` for a period of time? The consensus is no.
---
Closesrust-lang/rust#116592
Add LLVM range attributes to slice length parameters
It'll take a bunch more work to do this in layout -- because of cycles in `struct Foo<'a>(&'a Foo<'a>);` -- so until we figure out how to do that well, just look for slices specifically and add the proper range for the length.
Rollup of 5 pull requests
Successful merges:
- rust-lang/rust#144194 (Provide additional context to errors involving const traits)
- rust-lang/rust#148232 (ci: add runners for vanilla LLVM 21)
- rust-lang/rust#148240 (rustc_codegen: fix musttail returns for cast/indirect ABIs)
- rust-lang/rust#148247 (Remove a special case and move another one out of reachable_non_generics)
- rust-lang/rust#148370 (Point at inner item when it uses generic type param from outer item or `Self`)
r? `@ghost`
`@rustbot` modify labels: rollup
Remove a special case and move another one out of reachable_non_generics
One is no longer necessary, the other is best placed in cg_llvm as it works around a cg_llvm bug.
rustc_codegen: fix musttail returns for cast/indirect ABIs
Fixesrust-lang/rust#148239https://github.com/rust-lang/rust/issues/144986
Explicit tail calls trigger `bug!` for any callee whose ABI returns via `PassMode::Cast`, and we forgot to to forward the hidden `sret` out-pointer when the ABI requested an indirect return. The former causes ICE, the latter produced malformed IR (wrong codegen) if the return value is large enough to need `sret`.
Updated the musttail helper to accept cast-mode returns, made it so that we pass the return pointer through the tail-call path.
Added two UI tests to demonstrate each case.
This is my first time contributing, please do check if I did it right.
r? theemathas
cg_llvm: Pass `debuginfo_compression` through FFI as an enum
There are only three possible values, making an enum more appropriate.
This avoids string allocation on the Rust side, and avoids ad-hoc `!strcmp` to convert back to an enum on the C++ side.
Extend attribute deduction to determine whether parameters using
indirect pass mode might have their address captured. Similarly to
the deduction of `readonly` attribute this information facilitates
memcpy optimizations.
This was a bit more invasive than I had kind of hoped. An alternate
approach would be to add an extra call_intrinsic_with_attrs() that would
have the new-in-this-change signature for call_intrinsic, but this felt
about equivalent and made it a little easier to audit the relevant
callsites of call_intrinsic().
Offload host2
r? `@oli-obk`
A follow-up to my previous gpu host PR. With this, I can (in theory) run a sufficiently simple Rust function on GPUs. I tested it on AMD, where the amdgcn tartget of rustc causes issues due to Addressspace castings, which might not be valid. If I (manually) fix them, I can run the generated IR on an AMD GPU. This should conceptually also work on NVIDIA or Intel. I updated the dev-guide acordingly: https://rustc-dev-guide.rust-lang.org/offload/usage.html
I am unhappy with the amount of standalone functions in my offload code, so in my second commit I bundled some of the code around two structs which are Rust versions of the LLVM/Offload structs which they represent. The structs themselves only have doc comments. Since I directly lower everything to llvm-ir I didn't saw a big value in modelling the struct member variables.
Fix ICE on offsetted ZST pointer
I'm not sure this is the *right* fix, but it's simple enough and does roughly what I'd expect. Like with the previous optimization to codegen usize rather than a zero-sized static, there's no guarantee that we continue returning a particular value from the offsetting.
A grep for `const_usize.*align` found the same code copied to rustc_codegen_gcc and cranelift but a quick skim didn't find other cases of similar 'optimization'. That said, I'm not convinced I caught everything, it's not trivial to search for this.
Closesrust-lang/rust#147516
Add vsx register support for ppc inline asm, and implement preserves_flag option
This should address the last(?) missing pieces of inline asm for ppc:
* Explicit VSX register support. ISA 2.06 (POWER7) added a 64x128b register overlay extending the fpr's to 128b, and unifies them with the vmx (altivec) registers. Implementations details within gcc/llvm percolate up, and require using the `x` template modifier. I have updated the inline asm to implicitly include this for vsx arguments which do not specify it. ~~Support for the gcc codegen backend is still a todo.~~
* Implement the `preserves_flags` option. All ABI's, and all ISAs store their flags in `cr`, and the carry bit lives inside `xer`. The other status registers hold sticky bits or control bits which do not affect branch instructions.
There is some interest in the e500 (powerpcspe) port. Architecturally, it has a very different FP ISA, and includes a simd extension called SPR (which is not IBM's cell SPE). Notably, it does not have altivec/fpr/vsx registers. It also has an SPE accumulator register which its ABI marks as volatile, but I am not sure if the compiler uses it.
Move computation of allocator shim contents to cg_ssa
In the future this should make it easier to use weak symbols for the allocator shim on platforms that properly support weak symbols. And it would allow reusing the allocator shim code for handling default implementations of the upcoming externally implementable items feature on platforms that don't properly support weak symbols.
In addition to make this possible, the alloc error handler is now handled in a way such that it is possible to avoid using the allocator shim when liballoc is compiled without `no_global_oom_handling` if you use `#[alloc_error_handler]`. Previously this was only possible if you avoided liballoc entirely or compiled it with `no_global_oom_handling`. You still need to avoid libstd and to define the symbol that indicates that avoiding the allocator shim is unstable.
Where supported, VSX is a 64x128b register set which encompasses
both the floating point and vector registers.
In the type tests, xvsqrtdp is used as it is the only two-argument
vsx opcode supported by all targets on llvm. If you need to copy
a vsx register, the preferred way is "xxlor xt, xa, xa".
cg_llvm: Use `LLVMDIBuilderCreateGlobalVariableExpression`
- Part of rust-lang/rust#134001
- Follow-up to rust-lang/rust#146763
---
This PR dismantles the somewhat complicated `LLVMRustDIBuilderCreateStaticVariable` function, and replaces it with equivalent calls to `LLVMDIBuilderCreateGlobalVariableExpression` and `LLVMGlobalSetMetadata`.
A key difference is that the new code does not replicate the attempted downcast of `InitVal`. As far as I can tell, those downcasts were actually dead, because `llvm::ConstantInt` and `llvm::ConstantFP` are not subclasses of `llvm::GlobalVariable`. I tried replacing those code paths with fatal errors, and was unable to induce failure in any of the relevant test suites I ran.
I have also confirmed that if the calls to `create_static_variable` are commented out, debuginfo tests will fail, demonstrating some amount of relevant test coverage.
The new `DIBuilder` methods have been added via an extension trait, not as inherent methods, to avoid impeding rust-lang/rust#142897.
Note that the code in `LLVMRustDIBuilderCreateStaticVariable` that tried to
downcast `InitVal` appears to have been dead, because `llvm::ConstantInt` and
`llvm::ConstantFP` are not subclasses of `llvm::GlobalVariable`.
In the future this should make it easier to use weak symbols for the
allocator shim on platforms that properly support weak symbols. And it
would allow reusing the allocator shim code for handling default
implementations of the upcoming externally implementable items feature
on platforms that don't properly support weak symbols.
Currently it is possible to avoid linking the allocator shim when
__rust_no_alloc_shim_is_unstable_v2 is defined when linking rlibs
directly as some build systems need. However this requires liballoc to
be compiled with --cfg no_global_oom_handling, which places huge
restrictions on what functions you can call and makes it impossible to
use libstd. Or alternatively you have to define
__rust_alloc_error_handler and (when using libstd)
__rust_alloc_error_handler_should_panic
using #[rustc_std_internal_symbol]. With this commit you can either use
libstd and define __rust_alloc_error_handler_should_panic or not use
libstd and use #[alloc_error_handler] instead. Both options are still
unstable though.
Eventually the alloc_error_handler may either be removed entirely
(though the PR for that has been stale for years now) or we may start
using weak symbols for it instead. For the latter case this commit is a
prerequisite anyway.
refactor: Remove `LLVMRustInsertPrivateGlobal` and `define_private_global`
Since it can easily be implemented using the existing LLVM C API in
terms of `LLVMAddGlobal` and `LLVMSetLinkage` and `define_private_global`
was only used in one place.
Work towards https://github.com/rust-lang/rust/issues/46437