152 Commits

Author SHA1 Message Date
Stuart Cook
56213a553e
Rollup merge of #146106 - epage:whitespace, r=fee1-dead
fix(lexer): Only allow horizontal whitespace in frontmatter

In writing up the reference for frontmatter, I realized that we probably
shouldn't be accepting Unicode Line Ending characters between the code
fence and infostring or trailing after the infostring or a code fence.

In digging into the unicode specification we use for Whitespace, it
divides it up into categories, so I'm deferring to what it says for
horizontal whitespace for what should be used within a line.

Note, I am leaving out support for Unicode Default Ignorable characters.
I figure that can be discussed outside of this change within the
reference and tracking issue.

Fixes rust-lang/rust#145971

Frontmatter tracking issue: rust-lang/rust#136889
2025-09-03 23:08:10 +10:00
Nicholas Nethercote
301655eafe Revert introduction of [workspace.dependencies].
This was done in #145740 and #145947. It is causing problems for people
using r-a on anything that uses the rustc-dev rustup package, e.g. Miri,
clippy.

This repository has lots of submodules and subtrees and various
different projects are carved out of pieces of it. It seems like
`[workspace.dependencies]` will just be more trouble than it's worth.
2025-09-02 19:12:54 +10:00
Ed Page
6f0da976c5 fix(lexer): Only allow horizontal whitespace in frontmatter
In writing up the reference for frontmatter, I realized that we probably
shouldn't be accepting Unicode Line Ending characters between the code
fence and infostring or trailing after the infostring or a code fence.

In digging into the unicode specification we use for Whitespace, it
divides it up into categories, so I'm deferring to what it says for
horizontal whitespace for what should be used within a line.

Note, I am leaving out support for Unicde Default Ignorable characters.
I figure that can be discussed outside of this change within the
reference and tracking issue.
2025-09-01 20:51:39 -05:00
Ed Page
428e413414 docs(lexer): Organize and document whitespace by Pattern_White_Space 2025-09-01 20:51:39 -05:00
Ed Page
142e25e356 fix(lexer): Don't require frontmatters to be escaped with indented fences
The RFC only limits hyphens at the beginning of lines and not if they
are indented or embedded in other content.

Sticking to that approach was confirmed by the T-lang liason at
https://github.com/rust-lang/rust/issues/141367#issuecomment-3202217544

There is a regression in error message quality which I'm leaving for
someone if they feel this needs improving.
2025-08-28 14:08:33 -05:00
Nicholas Nethercote
b4c8fe2b4b Remove unnecessary [dependencies.unicode-properties] entries.
The Cargo style guide says to put dependencies on a single line if they
fit.
2025-08-28 08:08:40 +10:00
Nicholas Nethercote
dfa748e910 Add memchr to [workspace.dependencies]. 2025-08-27 13:59:32 +10:00
Ed Page
f43f974b9e fix(lexer): Allow '-' in the infostring continue set
This more closely matches the RFC and what our T-lang contact has asked
for, see https://github.com/rust-lang/rust/issues/136889#issuecomment-3212715312
2025-08-22 09:26:19 -05:00
Ed Page
df53b3dc04 test(lexer): Add frontmatter unit test 2025-07-10 10:25:29 -05:00
Ed Page
45a1e492b1 feat(lexer): Allow including frontmatter with 'tokenize' 2025-07-09 16:42:27 -05:00
klensy
c76d032f01 setup CI and tidy to use typos for spellchecking and fix few typos 2025-07-03 10:51:06 +03:00
Marijn Schouten
2a5225a369 rustc_lexer: typo fix + small cleanups 2025-06-06 13:08:16 +00:00
Matthew Jasper
55f59fb0e3 Fix parsing of frontmatters with inner hyphens 2025-06-04 15:51:36 +00:00
Deadbeef
662182637e Implement RFC 3503: frontmatters
Supercedes #137193
2025-05-05 23:10:08 +08:00
Guillaume Gomez
aff2bc7a88 Replace rustc_lexer/unescape with rustc-literal-escaper crate 2025-04-04 14:44:45 +02:00
Ralf Jung
20d04d8a40 Revert "Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu"
This reverts commit 08dfbf49e30d917c89e49eb14cb3f1e8b8a1c9ef, reversing
changes made to 10bcdad7df0de3cfb95c7bdb7b16908e73cafc09.
2025-03-18 13:28:56 +01:00
Jacob Pratt
08dfbf49e3
Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu
Add `*_value` methods to proc_macro lib

This is the implementation of https://github.com/rust-lang/libs-team/issues/459.

It allows to get the actual value (unescaped) of the different string literals.

Part of https://github.com/rust-lang/rust/issues/136652.

r? libs-api
2025-03-17 05:47:48 -04:00
Nicholas Nethercote
ff0a5fe975 Remove #![warn(unreachable_pub)] from all compiler/ crates.
It's no longer necessary now that `-Wunreachable_pub` is being passed.
2025-03-11 13:14:21 +11:00
许杰友 Jieyou Xu (Joe)
063ef18fdc Revert "Use workspace lints for crates in compiler/ #138084"
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).

This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.

This reverts commit 48caf81484b50dca5a5cebb614899a3df81ca898, reversing
changes made to c6662879b27f5161e95f39395e3c9513a7b97028.
2025-03-10 18:12:47 +08:00
Nicholas Nethercote
8a3e03392e Remove #![warn(unreachable_pub)] from all compiler/ crates.
(Except for `rustc_codegen_cranelift`.)

It's no longer necessary now that `unreachable_pub` is in the workspace
lints.
2025-03-08 08:41:43 +11:00
Nicholas Nethercote
beba32cebb Specify rust lints for compiler/ crates via Cargo.
By naming them in `[workspace.lints.rust]` in the top-level
`Cargo.toml`, and then making all `compiler/` crates inherit them with
`[lints] workspace = true`. (I omitted `rustc_codegen_{cranelift,gcc}`,
because they're a bit different.)

The advantages of this over the current approach:
- It uses a standard Cargo feature, rather than special handling in
  bootstrap. So, easier to understand, and less likely to get
  accidentally broken in the future.
- It works for proc macro crates.

It's a shame it doesn't work for rustc-specific lints, as the comments
explain.
2025-03-08 08:41:09 +11:00
Michael Goulet
12e3911d81 Greatly simplify lifetime captures in edition 2024 2025-02-22 22:24:52 +00:00
Michael Goulet
76d341fa09 Upgrade the compiler to edition 2024 2025-02-22 00:01:48 +00:00
Guillaume Gomez
94f0f2b603 Reexport literal-escaper from rustc_lexer to allow rust-analyzer to compile 2025-02-10 10:38:22 +01:00
Guillaume Gomez
49d2d5a116 Extract unescape from rustc_lexer into its own crate 2025-02-10 10:38:22 +01:00
bjorn3
1fcae03369 Rustfmt 2025-02-08 22:12:13 +00:00
gvozdvmozgu
9f469eb600 implement eat_until leveraging memchr in lexer 2025-02-05 07:03:53 -08:00
Eric Huss
a97404eee3 Add test to check unicode identifier version 2024-12-09 06:23:59 -08:00
Michael Goulet
b87e935407 Revert "Reject raw lifetime followed by \' as well"
This reverts commit 1990f1560801ca3f9e6a3286e58204aa329ee037.
2024-12-01 05:22:16 +00:00
Nicholas Nethercote
4cd2840f00 Clean up c_or_byte_string.
- Rename a misleading local `mk_kind` as `single_quoted`.
- Use `fn` for all three arguments, for consistency.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
11c96cfd94 Improve strip_shebang testing.
It's currently a bit ad hoc. This commit makes it more methodical, with
pairs of match/no-match tests for all the relevant cases.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
e9a0c3c98c Remove TokenKind::InvalidPrefix.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
2024-11-19 18:06:22 +11:00
Nicholas Nethercote
2c7c3697db Improve TokenKind comments.
- Improve wording.
- Use backticks consistently for examples.
2024-11-19 18:04:01 +11:00
Nicholas Nethercote
df29f9b0c3 Improve fake_ident_or_unknown_prefix.
- Rename it as `invalid_ident_or_prefix`, which matches the possible
  outputs (`InvalidIdent` or `InvalidPrefix`).
- Use the local wrapper for `is_xid_continue`, for consistency.
- Make it clear what `\u{200d}` means.
2024-11-19 18:01:43 +11:00
Michael Goulet
1990f15608 Reject raw lifetime followed by \' as well 2024-10-30 01:13:18 +00:00
Peter Jaszkowiak
321a5db7d4 Reserve guarded string literals (RFC 3593) 2024-10-08 18:21:16 -06:00
Michael Goulet
c682aa162b Reformat using the new identifier sorting from rustfmt 2024-09-22 19:11:29 -04:00
Michael Goulet
97910580aa Add initial support for raw lifetimes 2024-09-06 10:32:48 -04:00
Michael Goulet
3b3e43a386 Format lexer 2024-09-06 10:32:48 -04:00
Michael Goulet
9aaf873396 Reserve prefix lifetimes too 2024-09-06 10:32:48 -04:00
Nicholas Nethercote
6c84c55c9f Add warn(unreachable_pub) to rustc_lexer. 2024-08-27 15:12:46 +10:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Nicholas Nethercote
75b164d836 Use tidy to sort crate attributes for all compiler crates.
We already do this for a number of crates, e.g. `rustc_middle`,
`rustc_span`, `rustc_metadata`, `rustc_span`, `rustc_errors`.

For the ones we don't, in many cases the attributes are a mess.
- There is no consistency about order of attribute kinds (e.g.
  `allow`/`deny`/`feature`).
- Within attribute kind groups (e.g. the `feature` attributes),
  sometimes the order is alphabetical, and sometimes there is no
  particular order.
- Sometimes the attributes of a particular kind aren't even grouped
  all together, e.g. there might be a `feature`, then an `allow`, then
  another `feature`.

This commit extends the existing sorting to all compiler crates,
increasing consistency. If any new attribute line is added there is now
only one place it can go -- no need for arbitrary decisions.

Exceptions:
- `rustc_log`, `rustc_next_trait_solver` and `rustc_type_ir_macros`,
  because they have no crate attributes.
- `rustc_codegen_gcc`, because it's quasi-external to rustc (e.g. it's
  ignored in `rustfmt.toml`).
2024-06-12 15:49:10 +10:00
Michael Scholten
3c5e88c7d1 Improved the compiler code with clippy 2024-04-24 09:41:44 +02:00
Esteban Küber
19821ad234 Properly handle emojis as literal prefix in macros
Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes.

Fix #123696.
2024-04-10 23:19:27 +00:00
Esteban Küber
ea1883d7b2 Silence redundant error on char literal that was meant to be a string in 2021 edition 2024-03-17 23:35:19 +00:00
Esteban Küber
982918f493 Handle str literals written with ' lexed as lifetime
Given `'hello world'` and `'1 str', provide a structured suggestion for a valid string literal:

```
error[E0762]: unterminated character literal
  --> $DIR/lex-bad-str-literal-as-char-3.rs:2:26
   |
LL |     println!('hello world');
   |                          ^^^^
   |
help: if you meant to write a `str` literal, use double quotes
   |
LL |     println!("hello world");
   |              ~           ~
```
```
error[E0762]: unterminated character literal
  --> $DIR/lex-bad-str-literal-as-char-1.rs:2:20
   |
LL |     println!('1 + 1');
   |                    ^^^^
   |
help: if you meant to write a `str` literal, use double quotes
   |
LL |     println!("1 + 1");
   |              ~     ~
```

Fix #119685.
2024-03-17 23:35:18 +00:00
Nicholas Nethercote
0ac1195ee0 Invert diagnostic lints.
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.

This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
2024-02-06 13:12:33 +11:00
Nicholas Nethercote
6be2e5623c Use unescape_unicode for raw C string literals.
They can't contain `\x` escapes, which means they can't contain high
bytes, which means we can used `unescape_unicode` instead of
`unescape_mixed` to unescape them. This avoids unnecessary used of
`MixedUnit`.
2024-01-25 12:28:11 +11:00
Nicholas Nethercote
86f371ed59 Rename the unescaping functions.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
2024-01-25 12:28:11 +11:00