265 Commits

Author SHA1 Message Date
León Orell Valerian Liehr
7a66925a81
Strip frontmatter in fewer places 2025-09-09 19:49:40 +02:00
Urgau
d224d3a8fa Disallow shebang in --cfg and --check-cfg arguments 2025-09-06 00:21:04 +02:00
Ed Page
6f0da976c5 fix(lexer): Only allow horizontal whitespace in frontmatter
In writing up the reference for frontmatter, I realized that we probably
shouldn't be accepting Unicode Line Ending characters between the code
fence and infostring or trailing after the infostring or a code fence.

In digging into the unicode specification we use for Whitespace, it
divides it up into categories, so I'm deferring to what it says for
horizontal whitespace for what should be used within a line.

Note, I am leaving out support for Unicde Default Ignorable characters.
I figure that can be discussed outside of this change within the
reference and tracking issue.
2025-09-01 20:51:39 -05:00
Josh Triplett
52fadd8f96 Migrate BuiltinLintDiag::HiddenUnicodeCodepoints to use LintDiagnostic directly 2025-08-22 03:01:35 -07:00
Deadbeef
c335d5781d ignore frontmatters in TokenStream::new 2025-08-18 20:28:29 +08:00
xizheyin
2832517ba1
Clean code for rustc_parse/src/lexer
1. Rename `make_unclosed_delims_error` and return `Vec<Diag>`
2. change magic number `unclosed_delimiter_show_limit` to const
3. move `eof_err` below parsing logic
4. Add `calculate_spacing` for `bump` and `bump_minimal`

Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-07-22 18:38:33 +08:00
xizheyin
181c1bda0e
Deduplicate unmatched_delims in rustc_parse to reduce confusion
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-07-18 20:34:58 +08:00
Marijn Schouten
707a6f5463 update to literal-escaper 0.0.4 for better API without unreachable and faster string parsing 2025-06-23 06:36:22 +00:00
Matthew Jasper
65bdb31a97 Report text_direction_codepoint_in_literal when parsing
- The lint is now reported in code that gets removed/modified/duplicated
  by macro expansion.
- Spans are more accurate
- Fixes #140281
2025-05-27 15:57:41 +00:00
Deadbeef
662182637e Implement RFC 3503: frontmatters
Supercedes #137193
2025-05-05 23:10:08 +08:00
Chris Denton
ecb9775438
Rollup merge of #140175 - Kivooeo:new-fix-one, r=compiler-errors
`rc""` more clear error message

here is small fix that provides better error message when user is trying to use `rc""` the same way it was made for `rb""`

example of it's work

```rust
  |
2 |     rc"\n";
  |     ^^ unknown prefix
  |
  = note: prefixed identifiers and literals are reserved since Rust 2021
help: use `cr` for a raw C-string
  |
2 -     rc"\n";
2 +     cr"\n";
  |
```

**related issue**
fixes #140170

cc `@cyrgani` (issue author)
2025-04-23 00:43:08 +00:00
Kivooeo
44b19e5fe7 rc and cr more clear error message 2025-04-23 03:15:43 +05:00
xizheyin
dce5d99ce8 Rename open_brace to open_delimiters
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-22 14:37:26 +08:00
xizheyin
e827b17ddb Move make_unclosed_delims_error to lexer/diagonostics.rs
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-22 14:37:26 +08:00
Nicholas Nethercote
bf8ce32558 Remove token::{Open,Close}Delim.
By replacing them with `{Open,Close}{Param,Brace,Bracket,Invisible}`.

PR #137902 made `ast::TokenKind` more like `lexer::TokenKind` by
replacing the compound `BinOp{,Eq}(BinOpToken)` variants with fieldless
variants `Plus`, `Minus`, `Star`, etc. This commit does a similar thing
with delimiters. It also makes `ast::TokenKind` more similar to
`parser::TokenType`.

This requires a few new methods:
- `TokenKind::is_{,open_,close_}delim()` replace various kinds of
  pattern matches.
- `Delimiter::as_{open,close}_token_kind` are used to convert
  `Delimiter` values to `TokenKind`.

Despite these additions, it's a net reduction in lines of code. This is
because e.g. `token::OpenParen` is so much shorter than
`token::OpenDelim(Delimiter::Parenthesis)` that many multi-line forms
reduce to single line forms. And many places where the number of lines
doesn't change are still easier to read, just because the names are
shorter, e.g.:
```
-   } else if self.token != token::CloseDelim(Delimiter::Brace) {
+   } else if self.token != token::CloseBrace {
```
2025-04-21 07:35:56 +10:00
bors
f836ae4e66 Auto merge of #124141 - nnethercote:rm-Nonterminal-and-TokenKind-Interpolated, r=petrochenkov
Remove `Nonterminal` and `TokenKind::Interpolated`

A third attempt at this; the first attempt was #96724 and the second was #114647.

r? `@ghost`
2025-04-14 03:56:55 +00:00
Guillaume Gomez
aff2bc7a88 Replace rustc_lexer/unescape with rustc-literal-escaper crate 2025-04-04 14:44:45 +02:00
Nicholas Nethercote
4d8f7577b5 Impl Copy for Token and TokenKind. 2025-04-02 16:16:49 +11:00
Ralf Jung
20d04d8a40 Revert "Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu"
This reverts commit 08dfbf49e30d917c89e49eb14cb3f1e8b8a1c9ef, reversing
changes made to 10bcdad7df0de3cfb95c7bdb7b16908e73cafc09.
2025-03-18 13:28:56 +01:00
Jacob Pratt
08dfbf49e3
Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu
Add `*_value` methods to proc_macro lib

This is the implementation of https://github.com/rust-lang/libs-team/issues/459.

It allows to get the actual value (unescaped) of the different string literals.

Part of https://github.com/rust-lang/rust/issues/136652.

r? libs-api
2025-03-17 05:47:48 -04:00
Nicholas Nethercote
53167c0b7f Rename ast::TokenKind::Not as ast::TokenKind::Bang.
For consistency with `rustc_lexer::TokenKind::Bang`, and because other
`ast::TokenKind` variants generally have syntactic names instead of
semantic names (e.g. `Star` and `DotDot` instead of `Mul` and `Range`).
2025-03-03 09:26:13 +11:00
Nicholas Nethercote
2a1e2e9632 Replace ast::TokenKind::BinOp{,Eq} and remove BinOpToken.
`BinOpToken` is badly named, because it only covers the assignable
binary ops and excludes comparisons and `&&`/`||`. Its use in
`ast::TokenKind` does allow a small amount of code sharing, but it's a
clumsy factoring.

This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one
with 10 individual variants. This makes `ast::TokenKind` more similar to
`rustc_lexer::TokenKind`, which has individual variants for all
operators.

Although the number of lines of code increases, the number of chars
decreases due to the frequent use of shorter names like `token::Plus`
instead of `token::BinOp(BinOpToken::Plus)`.
2025-03-03 09:26:11 +11:00
Guillaume Gomez
49d2d5a116 Extract unescape from rustc_lexer into its own crate 2025-02-10 10:38:22 +01:00
klensy
dc62b8fd11 replaces few consts with statics to reduce readonly section 2025-01-28 17:38:22 +03:00
Nicholas Nethercote
2620eb42d7 Re-export more rustc_span::symbol things from rustc_span.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.

This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
2024-12-18 13:38:53 +11:00
Nicholas Nethercote
2e412fef75 Remove Lexer's dependency on Parser.
Lexing precedes parsing, as you'd expect: `Lexer` creates a
`TokenStream` and `Parser` then parses that `TokenStream`.

But, in a horrendous violation of layering abstractions and common
sense, `Lexer` depends on `Parser`! The `Lexer::unclosed_delim_err`
method does some error recovery that relies on creating a `Parser` to do
some post-processing of the `TokenStream` that the `Lexer` just created.

This commit just removes `unclosed_delim_err`. This change removes
`Lexer`'s dependency on `Parser`, and also means that `lex_token_tree`'s
return value can have a more typical form.

The cost is slightly worse error messages in two obscure cases, as shown
in these tests:
- tests/ui/parser/brace-in-let-chain.rs: there is slightly less
  explanation in this case involving an extra `{`.
- tests/ui/parser/diff-markers/unclosed-delims{,-in-macro}.rs: the diff
  marker detection is no longer supported (because that detection is
  implemented in the parser).

In my opinion this cost is outweighed by the magnitude of the code
cleanup.
2024-12-13 07:10:20 +11:00
Nicholas Nethercote
40c964510c Remove PErr.
It's just a synonym for `Diag` that adds no value and is only used in a
few places.
2024-12-12 11:31:55 +11:00
Esteban Küber
5404cbb996 Fix typo in RFC mention 3598 -> 3593
https://github.com/rust-lang/rfcs/blob/master/text/3593-unprefixed-guarded-strings.md
2024-12-09 17:16:14 +00:00
Michael Goulet
d878fd8877 Only error raw lifetime followed by \' in edition 2021+ 2024-12-01 05:23:16 +00:00
Guillaume Gomez
ca71c8fe5e
Rollup merge of #133487 - pitaj:reserve-guarded-strings, r=fee1-dead
fix confusing diagnostic for reserved `##`

Closes #131615
2024-11-28 12:06:04 +01:00
Peter Jaszkowiak
44f4f67f46 fix confusing diagnostic for reserved ## 2024-11-25 22:29:14 -07:00
Nicholas Nethercote
16a39bb7ca Streamline lex_token_trees error handling.
- Use iterators instead of `for` loops.
- Use `if`/`else` instead of `match`.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
ba1a1ddc3f Fix some formatting.
Must be one of those cases where the function is too long and rustfmt
bails out.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
593cf680aa Split Lexer::bump.
It has two different ways of being called.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
98777b4c49 Merge TokenTreesReader into StringReader.
There is a not-very-useful layering in the lexer, where
`TokenTreesReader` contains a `StringReader`. This commit combines them
and names the result `Lexer`, which is a more obvious name for it.

The methods of `Lexer` are now split across `mod.rs` and `tokentrees.rs`
which isn't ideal, but it doesn't seem worth moving a bunch of code to
avoid it.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
cee88f7a3f Prepare for invisible delimiters.
Current places where `Interpolated` is used are going to change to
instead use invisible delimiters. This prepares for that.
- It adds invisible delimiter cases to the `can_begin_*`/`may_be_*`
  methods and the `failed_to_match_macro` that are equivalent to the
  existing `Interpolated` cases.
- It adds panics/asserts in some places where invisible delimiters
  should never occur.
- In `Parser::parse_struct_fields` it excludes an ident + invisible
  delimiter from special consideration in an error message, because
  that's quite different to an ident + paren/brace/bracket.
2024-11-21 08:22:11 +11:00
Nicholas Nethercote
e9a0c3c98c Remove TokenKind::InvalidPrefix.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
2024-11-19 18:06:22 +11:00
Michael Goulet
9785c7cf94 Enforce that raw lifetime identifiers must be valid raw identifiers 2024-10-30 14:45:22 +00:00
Josh Triplett
ecdc2441b6 "innermost", "outermost", "leftmost", and "rightmost" don't need hyphens
These are all standard dictionary words and don't require hyphenation.
2024-10-23 02:45:24 -07:00
Peter Jaszkowiak
321a5db7d4 Reserve guarded string literals (RFC 3593) 2024-10-08 18:21:16 -06:00
Michael Goulet
c682aa162b Reformat using the new identifier sorting from rustfmt 2024-09-22 19:11:29 -04:00
Michael Goulet
5de89bb011 Store raw ident span for raw lifetime 2024-09-17 16:43:18 -04:00
Eduardo Sánchez Muñoz
0b20ffcb63 Remove needless returns detected by clippy in the compiler 2024-09-09 13:32:22 +02:00
Michael Goulet
afa24f0180 Add some more tests 2024-09-06 10:32:48 -04:00
Michael Goulet
97910580aa Add initial support for raw lifetimes 2024-09-06 10:32:48 -04:00
Michael Goulet
3b3e43a386 Format lexer 2024-09-06 10:32:48 -04:00
Michael Goulet
9aaf873396 Reserve prefix lifetimes too 2024-09-06 10:32:48 -04:00
Nicholas Nethercote
7923b20dd9 Use impl PartialEq<TokenKind> for Token more.
This lets us compare a `Token` with a `TokenKind`. It's used a lot, but
can be used even more, avoiding the need for some `.kind` uses.
2024-08-14 16:37:09 +10:00
bors
595316b400 Auto merge of #127955 - chenyukang:yukang-fix-mismatched-delimiter-issue-12786, r=nnethercote
Add limit for unclosed delimiters in lexer diagnostic

Fixes #127868

The first commit shows the original diagnostic, and the second commit shows the changes.
2024-07-30 13:02:16 +00:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00