This stabilizes a subset of the `-Clink-self-contained` components on x64 linux:
the rust-lld opt-out.
The opt-in is not stabilized, as interactions with other stable flags require
more internal work, but are not needed for stabilizing using rust-lld by default.
Similarly, since we only switch to rust-lld on x64 linux, the opt-out is
only stabilized there. Other targets still require `-Zunstable-options`
to use it.
This stabilizes a subset of the `-Clinker-features` components on x64 linux:
the lld opt-out.
The opt-in is not stabilized, as interactions with other stable flags require
more internal work, but are not needed for stabilizing using rust-lld by default.
Similarly, since we only switch to rust-lld on x64 linux, the opt-out is
only stabilized there. Other targets still require `-Zunstable-options`
to use it.
Allow custom default address spaces and parse `p-` specifications in the datalayout string
Some targets, such as CHERI, use as default an address space different from the "normal" default address space `0` (in the case of CHERI, [200 is used](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-877.pdf)). Currently, `rustc` does not allow to specify custom address spaces and does not take into consideration [`p-` specifications in the datalayout string](https://llvm.org/docs/LangRef.html#langref-datalayout).
This patch tries to mitigate these problems by allowing targets to define a custom default address space (while keeping the default value to address space `0`) and adding the code to parse the `p-` specifications in `rustc_abi`. The main changes are that `TargetDataLayout` now uses functions to refer to pointer-related informations, instead of having specific fields for the size and alignment of pointers in the default address space; furthermore, the two `pointer_size` and `pointer_align` fields in `TargetDataLayout` are replaced with an `FxHashMap` that holds info for all the possible address spaces, as parsed by the `p-` specifications.
The potential performance drawbacks of not having ad-hoc fields for the default address space will be tested in this PR's CI run.
r? workingjubilee
There is one comment at a call site and one comment in the function
definition that are mostly saying the same thing. Fold the call site
comment into the function definition comment to reduce duplication.
There are actually some inaccuracies in the comments but let's
deduplicate before we address the inaccuracies.
Make __rust_alloc_error_handler_should_panic a function
Fixesrust-lang/rust#143253
`__rust_alloc_error_handler_should_panic` is a static but was being exported as a function.
For most targets this doesn't matter, but Arm64EC Windows uses different decorations for exported variables vs functions, hence it fails to link when `-Z oom=abort` is enabled.
We've had issues in the past with statics like this (see rust-lang/rust#141061) but the tldr; is that Arm64EC needs symbols correctly exported as either a function or data, and data MUST and MUST ONLY be marked `dllimport` when the symbol is being imported from another binary, which is non-trivial to calculate for these compiler-generated statics.
So, instead, the easiest thing to do is to make `__rust_alloc_error_handler_should_panic` a function instead.
Since `__rust_alloc_error_handler_should_panic` isn't involved in any linking shenanigans, I've marked it as `AlwaysInline` with the hopes that the various backends will see that it is just returning a constant and perform the same optimizations as the previous implementation.
r? `@bjorn3`
Add PrintTAFn flag for targeted type analysis printing
## Summary
This PR adds a new `PrintTAFn` flag to the `-Z autodiff` option that allows printing type analysis information for a specific function, rather than all functions.
## Changes
### New Flag
- Added `PrintTAFn=<function_name>` option to `-Z autodiff`
- Usage: `-Z autodiff=Enable,PrintTAFn=my_function_name`
### Implementation Details
- **Rust side**: Added `PrintTAFn(String)` variant to `AutoDiff` enum
- **Parser**: Updated `parse_autodiff` to handle `PrintTAFn=<function_name>` syntax with proper error handling
- **FFI**: Added `set_print_type_fun` function to interface with Enzyme's `FunctionToAnalyze` command line option
- **Documentation**: Updated help text and documentation for the new flag
### Files Modified
- `compiler/rustc_session/src/config.rs`: Added `PrintTAFn(String)` variant
- `compiler/rustc_session/src/options.rs`: Updated parser and help text (now shows `PrintTAFn` in the list)
- `compiler/rustc_codegen_llvm/src/llvm/enzyme_ffi.rs`: Added FFI function and static variable
- `compiler/rustc_codegen_llvm/src/back/lto.rs`: Added handling for new flag
- `src/doc/rustc-dev-guide/src/autodiff/flags.md`: Updated documentation
- `src/doc/unstable-book/src/compiler-flags/autodiff.md`: Updated documentation
## Testing
The flag can be tested with:
```bash
rustc +enzyme -Z autodiff=Enable,PrintTAFn=square test.rs
```
This will print type analysis information only for the function named "square" instead of all functions.
## Error Handling
The parser includes proper error handling:
- Missing argument: `PrintTAFn` without `=<function_name>` will show an error
- Unknown options: Invalid autodiff options will be reported
r? ```@ZuseZ4```
Refactor Translator
My main motivation was to simplify the usage of `SilentEmitter` for users like rustfmt. A few refactoring opportunities arose along the way.
* Replace `Translate` trait with `Translator` struct
* Replace `Emitter: Translate` with `Emitter::translator`
* Split `SilentEmitter` into `FatalOnlyEmitter` and `SilentEmitter`
Rollup of 6 pull requests
Successful merges:
- rust-lang/rust#135656 (Add `-Z hint-mostly-unused` to tell rustc that most of a crate will go unused)
- rust-lang/rust#138237 (Get rid of `EscapeDebugInner`.)
- rust-lang/rust#141614 (lint direct use of rustc_type_ir )
- rust-lang/rust#142123 (Implement initial support for timing sections (`--json=timings`))
- rust-lang/rust#142377 (Try unremapping compiler sources)
- rust-lang/rust#142674 (remove duplicate crash test)
r? `@ghost`
`@rustbot` modify labels: rollup
Implement initial support for timing sections (`--json=timings`)
This PR implements initial support for emitting high-level compilation section timings. The idea is to provide a very lightweight way of emitting durations of various compilation sections (frontend, backend, linker, or on a more granular level macro expansion, typeck, borrowck, etc.). The ultimate goal is to stabilize this output (in some form), make Cargo pass `--json=timings` and then display this information in the HTML output of `cargo build --timings`, to make it easier to quickly profile "what takes so long" during the compilation of a Cargo project. I would personally also like if Cargo printed some of this information in the interactive `cargo build` output, but the `build --timings` use-case is the main one.
Now, this information is already available with several other sources, but I don't think that we can just use them as they are, which is why I proposed a new way of outputting this data (`--json=timings`):
- This data is available under `-Zself-profile`, but that is very expensive and forever unstable. It's just a too big of a hammer to tell us the duration it took to run the linker.
- It could also be extracted with `-Ztime-passes`. That is pretty much "for free" in terms of performance, and it can be emitted in a structured form to JSON via `-Ztime-passes-format=json`. I guess that one alternative might be to stabilize this flag in some form, but that form might just be `--json=timings`? I guess what we could do in theory is take the already emitted time passes and reuse them for `--json=timings`. Happy to hear suggestions!
I'm sending this PR mostly for a vibeck, to see if the way I implemented it is passable. There are some things to figure out:
- How do we represent the sections? Originally I wanted to output `{ section, duration }`, but then I realized that it might be more useful to actually emit `start` and `end` events. Both because it enables to see the output incrementally (in case compilation takes a long time and you read the outputs directly, or Cargo decides to show this data in `cargo build` some day in the future), and because it makes it simpler to represent hierarchy (see below). The timestamps currently emit microseconds elapsed from a predetermined point in time (~start of rustc), but otherwise they are fully opaque, and should be only ever used to calculate the duration using `end - start`. We could also precompute the duration for the user in the `end` event, but that would require doing more work in rustc, which I would ideally like to avoid :P
- Do we want to have some form of hierarchy? I think that it would be nice to show some more granular sections rather than just frontend/backend/linker (e.g. macro expansion, typeck and borrowck as a part of the frontend). But for that we would need some way of representing hierarchy. A simple way would be something like `{ parent: "frontend" }`, but I realized that with start/end timestamps we get the hierarchy "for free", only the client will need to reconstruct it from the order of start/end events (e.g. `start A`, `start B` means that `B` is a child of `A`).
- What exactly do we want to stabilize? This is probably a question for later. I think that we should definitely stabilize the format of the emitted JSON objects, and *maybe* some specific section names (but we should also make it clear that they can be missing, e.g. you don't link everytime you invoke `rustc`).
The PR be tested e.g. with `rustc +stage1 src/main.rs --json=timings --error-format=json -Zunstable-options` on a crate without dependencies (it is not easy to use `--json` with stock Cargo, because it also passes this flag to `rustc`, so this will later need Cargo integration to be usable with it).
Zulip discussions: [#t-compiler > Outputting time spent in various compiler sections](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Outputting.20time.20spent.20in.20various.20compiler.20sections/with/518850162)
MCP: https://github.com/rust-lang/compiler-team/issues/873
r? ``@nnethercote``
Add `-Z hint-mostly-unused` to tell rustc that most of a crate will go unused
This hint allows the compiler to optimize its operation based on this assumption, in order to compile faster. This is a hint, and does not guarantee any particular behavior.
This option can substantially speed up compilation if applied to a large dependency where the majority of the dependency does not get used. This flag may slow down compilation in other cases.
Currently, this option makes the compiler defer as much code generation as possible from functions in the crate, until later crates invoke those functions. Functions that never get invoked will never have code generated for them. For instance, if a crate provides thousands of functions, but only a few of them will get called, this flag will result in the compiler only doing code generation for the called functions. (This uses the same mechanisms as cross-crate inlining of functions.) This does not affect `extern` functions, or functions marked as `#[inline(never)]`.
This option has already existed in nightly as `-Zcross-crate-inline-threshold=always` for some time, and has gotten testing in that form. However, this option is still unstable, to give an opportunity for wider testing in this form.
Some performance numbers, based on a crate with many dependencies having just *one* large dependency set to `-Z hint-mostly-unused` (using Cargo's `profile-rustflags` option):
A release build went from 4m07s to 2m04s.
A non-release build went from 2m26s to 1m28s.
Move metadata object generation for dylibs to the linker code
This deduplicates some code between codegen backends and may in the future allow adding extra metadata that is only known at link time.
Prerequisite of https://github.com/rust-lang/rust/issues/96708.
refactor `AttributeGate` and `rustc_attr!` to emit notes during feature checking
First commit changes the following:
- `AttributeGate ` from an enum with (four) tuple fields to (five) named fields
- adds a `notes` fields that is emitted as notes in the `PostExpansionVisitor` pass
- removes the `this compiler was built on YYYY-MM-DD; consider upgrading it if it is out of date` note if the feature gate is `rustc_attrs`.
- various phrasing changes and touchups
- and finally, the reason why I went down this path to begin with: tell people they can use the diagnostic namespace when they hit the rustc_on_unimplemented feature gate 🙈
Second commit removes unused machinery for deprecated attributes
It collects data about macro expansions and prints them in a table after
expansion finishes. It's very useful for detecting macro bloat,
especially for proc macros.
Details:
- It measures code snippets by pretty-printing them and then measuring
lines and bytes. This required a bunch of additional pretty-printing
plumbing, in `rustc_ast_pretty` and `rustc_expand`.
- The measurement is done in `MacroExpander::expand_invoc`.
- The measurements are stored in `ExtCtxt::macro_stats`.
This hint allows the compiler to optimize its operation based on this
assumption, in order to compile faster. This is a hint, and does not
guarantee any particular behavior.
This option can substantially speed up compilation if applied to a large
dependency where the majority of the dependency does not get used. This flag
may slow down compilation in other cases.
Currently, this option makes the compiler defer as much code generation as
possible from functions in the crate, until later crates invoke those
functions. Functions that never get invoked will never have code generated for
them. For instance, if a crate provides thousands of functions, but only a few
of them will get called, this flag will result in the compiler only doing code
generation for the called functions. (This uses the same mechanisms as
cross-crate inlining of functions.) This does not affect `extern` functions, or
functions marked as `#[inline(never)]`.
Some performance numbers, based on a crate with many dependencies having
just *one* large dependency set to `-Z hint-mostly-unused` (using
Cargo's `profile-rustflags` option):
A release build went from 4m07s to 2m04s.
A non-release build went from 2m26s to 1m28s.
Before this change we had two different ways to attempt to locate the
sysroot which are inconsistently used:
* get_or_default_sysroot which tries to locate based on the 0th cli
argument and if that doesn't work falls back to locating it using the
librustc_driver.so location and returns a single path.,
* sysroot_candidates which takes the former and additionally does
another attempt at locating using librustc_driver.so except without
linux multiarch handling and then returns both paths.,
The latter was originally introduced to be able to locate the codegen
backend back when cg_llvm was dynamically linked even for a custom
driver when the --sysroot passed in does not contain a copy of cg_llvm.
Back then get_or_default_sysroot did not attempt to locate the sysroot
based on the location of librustc_driver.so yet. Because that is now
done, the only case where removing sysroot_candidates can break things
is if you have a custom driver inside what looks like a sysroot
including the lib/rustlib directory, but which is missing some parts of
the full sysroot like eg rust-lld.
Optionally don't steal the THIR
The THIR being stolen is a recurrent pain for authors of rustc drivers. This makes it optional, so that the `thir_body` query can still be used after analysis of the crate has completed.
Add unimplemented `current_dll_path()` for WASI
This is the only change needed to Rust to allow compiling rustfmt for WASI (rustfmt uses some internal rustc crates).
dladdr cannot leave dli_fname to be null
There are two places in the repo calling `dladdr`, and they are inconsistent wrt their assumption of whether the `dli_fname` field can be null. Let's make them consistent. I see nothing in the docs that allows it to be null, but just to be on the safe side let's make this an assertion so hopefully we get a report if that ever happens.
coverage: Detect unused local file IDs to avoid an LLVM assertion
Each function's coverage metadata contains a *local file table* that maps local file IDs (used by the function's mapping regions) to global file IDs (shared by all functions in the same CGU).
LLVM requires all local file IDs to have at least one mapping region, and has an assertion that will fail if it detects a local file ID with no regions. To make sure that assertion doesn't fire, we need to detect and skip functions whose metadata would trigger it.
(This can't actually happen yet, because currently all of a function's spans must belong to the same file and expansion. But this will be an important edge case when adding expansion region support.)