Refactor lint buffering to avoid requiring a giant enum
Lint buffering currently relies on a giant enum `BuiltinLintDiag` containing all the lints that might potentially get buffered. In addition to being an unwieldy enum in a central crate, this also makes `rustc_lint_defs` a build bottleneck: it depends on various types from various crates (with a steady pressure to add more), and many crates depend on it.
Having all of these variants in a separate crate also prevents detecting when a variant becomes unused, which we can do with a dedicated type defined and used in the same crate.
Refactor this to use a dyn trait, to allow using `LintDiagnostic` types directly.
Because the existing `BuiltinLintDiag` requires some additional types in order to decorate some variants, which are only available later in `rustc_lint`, use an enum `DecorateDiagCompat` to handle both the `dyn LintDiagnostic` case and the `BuiltinLintDiag` case.
---
With the infrastructure in place, use it to migrate three of the enum variants to use `LintDiagnostic` directly, as a proof of concept and to demonstrate that the net result is a reduction in code size and a removal of a boilerplate-heavy layer of indirection.
Also remove an unused `BuiltinLintDiag` variant.
Lint buffering currently relies on a giant enum `BuiltinLintDiag`
containing all the lints that might potentially get buffered. In
addition to being an unwieldy enum in a central crate, this also makes
`rustc_lint_defs` a build bottleneck: it depends on various types from
various crates (with a steady pressure to add more), and many crates
depend on it.
Having all of these variants in a separate crate also prevents detecting
when a variant becomes unused, which we can do with a dedicated type
defined and used in the same crate.
Refactor this to use a dyn trait, to allow using `LintDiagnostic` types
directly.
This requires boxing, but all of this is already on the slow path
(emitting an error).
Because the existing `BuiltinLintDiag` requires some additional types in
order to decorate some variants, which are only available later in
`rustc_lint`, use an enum `DecorateDiagCompat` to handle both the `dyn
LintDiagnostic` case and the `BuiltinLintDiag` case.
Add `-Zindirect-branch-cs-prefix`
Cc: ``@azhogin`` ``@Darksonn``
This goes on top of https://github.com/rust-lang/rust/pull/135927, i.e. please skip the first commit here. Please feel free to inherit it there.
In fact, I am not sure if there is any use case for the flag without `-Zretpoline*`. GCC and Clang allow it, though.
There is a `FIXME` for two `ignore`s in the test that I took from another test I did in the past -- they may be needed or not here since I didn't run the full CI. Either way, it is not critical.
Tracking issue: https://github.com/rust-lang/rust/issues/116852.
MCP: https://github.com/rust-lang/compiler-team/issues/868.
Fix adjacent code
Fix duplicate warning; merge test into `tests/ui-fulldeps/internal-lints`
Use `rustc_middle::ty::FnSig::inputs`
Address two review comments
- https://github.com/rust-lang/rust/pull/139345#discussion_r2109006991
- https://github.com/rust-lang/rust/pull/139345#discussion_r2109058588
Use `Instance::try_resolve`
Import `rustc_middle::ty::Ty` as `Ty` rather than `MiddleTy`
Simplify predicate handling
Add more `#[allow(rustc::potential_query_instability)]` following rebase
Remove two `#[allow(rustc::potential_query_instability)]` following rebase
Address review comment
Update compiler/rustc_lint/src/internal.rs
Co-authored-by: lcnr <rust@lcnr.de>
gpu offload host code generation
r? ghost
This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening.
A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.
I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.
Tracking:
- https://github.com/rust-lang/rust/issues/131513
Handle build scripts better in `-Zmacro-stats` output.
Currently all build scripts are listed as `build_script_build` in the stats header. This commit uses `CARGO_PKG_NAME` to improve that.
I tried it on Bevy, it works well, giving output like this on the build script:
```
MACRO EXPANSION STATS: serde build script
```
and this on the crate itself:
```
MACRO EXPANSION STATS: serde
```
r? `@Kobzol`
Currently all build scripts are listed as `build_script_build` in the
stats header. This commit uses `CARGO_PKG_NAME` to improve that.
I tried it on Bevy, it works well, giving output like this on the build
script:
```
MACRO EXPANSION STATS: serde build script
```
and this on the crate itself:
```
MACRO EXPANSION STATS: serde
```
Tweak `-Zmacro-stats` measurement.
It currently reports net size, i.e. size(output) - size(input). After some use I think this is sub-optimal, and it's better to just report size(output). Because for derive macros the input size is always 1, and for attribute macros it's almost always 1.
r? ```@petrochenkov```
Rollup of 9 pull requests
Successful merges:
- rust-lang/rust#142645 (Also emit suggestions for usages in the `non_upper_case_globals` lint)
- rust-lang/rust#142657 (mbe: Clean up code with non-optional `NonterminalKind`)
- rust-lang/rust#142799 (rustc_session: Add a structure for keeping both explicit and default sysroots)
- rust-lang/rust#142805 (Emit a single error when importing a path with `_`)
- rust-lang/rust#142882 (Lazy init diagnostics-only local_names in borrowck)
- rust-lang/rust#142883 (Add impl_trait_in_bindings tests from rust-lang/rust#61773)
- rust-lang/rust#142943 (Don't include current rustc version string in feature removed help)
- rust-lang/rust#142965 ([RTE-497] Ignore `c-link-to-rust-va-list-fn` test on SGX platform)
- rust-lang/rust#142972 (Add a missing mailmap entry)
r? `@ghost`
`@rustbot` modify labels: rollup
To make it match `-Zmacro-stats`, and work better if you have enabled it
for multiple crates.
- Print each crate's name.
- Print a `===` banner at the start and end for separation.
Taking inspiration from `-Zmacro-stats`:
- Use "{prefix}" consistently.
- Use names for column widths.
- Write output in a single `eprint!` call, in an attempt to minimize
interleaving of output from different rustc processes.
- Use `repeat` for the long `---` banners.
It currently reports net size, i.e. size(output) - size(input). After
some use I think this is sub-optimal, and it's better to just report
size(output). Because for derive macros the input size is always 1, and
for attribute macros it's almost always 1.
Add codegen timing section
And since we now start and end the sections also using separate functions, also add some light checking if we're generating the sections correctly.
I'm integrating `--timings` into Cargo, and I realized that the codegen timings would be quite useful for that. Frontend can be computed simply as `[start of compilation, start of codegen]` for now.
r? `@nnethercote`
Refactor Translator
My main motivation was to simplify the usage of `SilentEmitter` for users like rustfmt. A few refactoring opportunities arose along the way.
* Replace `Translate` trait with `Translator` struct
* Replace `Emitter: Translate` with `Emitter::translator`
* Split `SilentEmitter` into `FatalOnlyEmitter` and `SilentEmitter`
Reduce uses of `hir_crate`.
I tried rebasing my old incremental-HIR branch. This is a by-product, which is required if we want to get rid of `hir_crate` entirely.
The second commit is a drive-by cleanup. It can be pulled into its own PR.
r? ````@oli-obk````
Implement initial support for timing sections (`--json=timings`)
This PR implements initial support for emitting high-level compilation section timings. The idea is to provide a very lightweight way of emitting durations of various compilation sections (frontend, backend, linker, or on a more granular level macro expansion, typeck, borrowck, etc.). The ultimate goal is to stabilize this output (in some form), make Cargo pass `--json=timings` and then display this information in the HTML output of `cargo build --timings`, to make it easier to quickly profile "what takes so long" during the compilation of a Cargo project. I would personally also like if Cargo printed some of this information in the interactive `cargo build` output, but the `build --timings` use-case is the main one.
Now, this information is already available with several other sources, but I don't think that we can just use them as they are, which is why I proposed a new way of outputting this data (`--json=timings`):
- This data is available under `-Zself-profile`, but that is very expensive and forever unstable. It's just a too big of a hammer to tell us the duration it took to run the linker.
- It could also be extracted with `-Ztime-passes`. That is pretty much "for free" in terms of performance, and it can be emitted in a structured form to JSON via `-Ztime-passes-format=json`. I guess that one alternative might be to stabilize this flag in some form, but that form might just be `--json=timings`? I guess what we could do in theory is take the already emitted time passes and reuse them for `--json=timings`. Happy to hear suggestions!
I'm sending this PR mostly for a vibeck, to see if the way I implemented it is passable. There are some things to figure out:
- How do we represent the sections? Originally I wanted to output `{ section, duration }`, but then I realized that it might be more useful to actually emit `start` and `end` events. Both because it enables to see the output incrementally (in case compilation takes a long time and you read the outputs directly, or Cargo decides to show this data in `cargo build` some day in the future), and because it makes it simpler to represent hierarchy (see below). The timestamps currently emit microseconds elapsed from a predetermined point in time (~start of rustc), but otherwise they are fully opaque, and should be only ever used to calculate the duration using `end - start`. We could also precompute the duration for the user in the `end` event, but that would require doing more work in rustc, which I would ideally like to avoid :P
- Do we want to have some form of hierarchy? I think that it would be nice to show some more granular sections rather than just frontend/backend/linker (e.g. macro expansion, typeck and borrowck as a part of the frontend). But for that we would need some way of representing hierarchy. A simple way would be something like `{ parent: "frontend" }`, but I realized that with start/end timestamps we get the hierarchy "for free", only the client will need to reconstruct it from the order of start/end events (e.g. `start A`, `start B` means that `B` is a child of `A`).
- What exactly do we want to stabilize? This is probably a question for later. I think that we should definitely stabilize the format of the emitted JSON objects, and *maybe* some specific section names (but we should also make it clear that they can be missing, e.g. you don't link everytime you invoke `rustc`).
The PR be tested e.g. with `rustc +stage1 src/main.rs --json=timings --error-format=json -Zunstable-options` on a crate without dependencies (it is not easy to use `--json` with stock Cargo, because it also passes this flag to `rustc`, so this will later need Cargo integration to be usable with it).
Zulip discussions: [#t-compiler > Outputting time spent in various compiler sections](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Outputting.20time.20spent.20in.20various.20compiler.20sections/with/518850162)
MCP: https://github.com/rust-lang/compiler-team/issues/873
r? ``@nnethercote``
Add `-Z hint-mostly-unused` to tell rustc that most of a crate will go unused
This hint allows the compiler to optimize its operation based on this assumption, in order to compile faster. This is a hint, and does not guarantee any particular behavior.
This option can substantially speed up compilation if applied to a large dependency where the majority of the dependency does not get used. This flag may slow down compilation in other cases.
Currently, this option makes the compiler defer as much code generation as possible from functions in the crate, until later crates invoke those functions. Functions that never get invoked will never have code generated for them. For instance, if a crate provides thousands of functions, but only a few of them will get called, this flag will result in the compiler only doing code generation for the called functions. (This uses the same mechanisms as cross-crate inlining of functions.) This does not affect `extern` functions, or functions marked as `#[inline(never)]`.
This option has already existed in nightly as `-Zcross-crate-inline-threshold=always` for some time, and has gotten testing in that form. However, this option is still unstable, to give an opportunity for wider testing in this form.
Some performance numbers, based on a crate with many dependencies having just *one* large dependency set to `-Z hint-mostly-unused` (using Cargo's `profile-rustflags` option):
A release build went from 4m07s to 2m04s.
A non-release build went from 2m26s to 1m28s.