Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Rename `is_like_osx` to `is_like_darwin`
Replace `is_like_osx` with `is_like_darwin`, which more closely describes reality (OS X is the pre-2016 name for macOS, and is by now quite outdated; Darwin is the overall name for the OS underlying Apple's macOS, iOS, etc.).
``@rustbot`` label O-apple
r? compiler
The embedded bitcode should always be prepared for LTO/ThinLTO
Fixes#115344. Fixes#117220.
There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.
When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.
This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.
r? nikic
`rustc_codegen_llvm` relied on `Deref` impls where `Deref::Target` was
or contained an extern type - in my experimental implementation of
rust-lang/rfcs#3729, this isn't possible as the `Target` associated
type's `?Sized` bound cannot be relaxed backwards compatibly (unless we
come up with some way of doing this).
In later pull requests with the rust-lang/rfcs#3729 implementation,
breakage like this could only occur for nightly users relying on the
`extern_types` feature.
Upstreaming this to avoid needing to keep carrying this patch locally,
and I think it'll necessarily need to change eventually.
Document some safety constraints and use more safe wrappers
Lots of unsafe codegen_llvm code has safe wrappers already, so I used some of them and added some where applicable. I stopped here because this diff is large enough and should probably be reviewed independently of other changes.
cg_llvm: Reduce visibility of some items outside the `llvm` module
Next piece of #135502
This reduces the visibility of items (other than those in the `llvm` module) so that dead code analysis will correctly identify unused items.
See llvm/llvm-project#121851
For LLVM 20+, this function (`renameModuleForThinLTO`) has no return
value. For prior versions of LLVM, this never failed, but had a
signature which allowed an error value people were handling.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Pass end position of span through inline ASM cookie
Before this PR, only the start position of the span was passed though the inline ASM cookie to diagnostics. LLVM 19 has full support for 64-bit inline ASM cookies; this PR uses that to pass the end position of the span in the upper 32 bits, meaning inline ASM diagnostics now point at the entire line the error occurred on, not just the first character of it.
The target name can be anything with custom target specs. Matching on
fields inside the target spec is much more robust than matching on the
target name.
Move versioned Apple LLVM targets from `rustc_target` to `rustc_codegen_ssa`
Fully specified LLVM targets contain the OS version on macOS/iOS/tvOS/watchOS/visionOS, and this version depends on the deployment target environment variables like `MACOSX_DEPLOYMENT_TARGET`, `IPHONEOS_DEPLOYMENT_TARGET` etc.
We would like to move this to later in the compilation pipeline, both because it feels impure to access environment variables when fetching target information, but mostly because we need access to more information from https://github.com/rust-lang/rust/pull/130883 to do https://github.com/rust-lang/rust/issues/118204. See also https://github.com/rust-lang/rust/pull/129342#issuecomment-2335156119 for some discussion.
The first and second commit does the actual refactor, it should be a non-functional change, the third commit adds diagnostics for invalid deployment targets, which are now possible to do because we have access to the session.
Tested with the same commands as in https://github.com/rust-lang/rust/pull/130435.
r? ``````@petrochenkov``````
The OS version depends on the deployment target environment variables,
the access of which we want to move to later in the compilation pipeline
that has access to more information, for example `env_depinfo`.
cg_llvm: Clean up FFI calls for setting module flags
This is a combination of several inter-related changes to how module flags are set:
- Remove some unnecessary code for setting an `"LTOPostLink"` flag, which has been obsolete since LLVM 17.
- Define our own enum instead of relying on enum values defined by LLVM's unstable C++ API.
- Use safe wrapper functions to set module flags, instead of direct `unsafe` calls.
- Consistently pass pointer/length strings instead of C strings.
- Remove or shrink some `unsafe` blocks.