In writing up the reference for frontmatter, I realized that we probably
shouldn't be accepting Unicode Line Ending characters between the code
fence and infostring or trailing after the infostring or a code fence.
In digging into the unicode specification we use for Whitespace, it
divides it up into categories, so I'm deferring to what it says for
horizontal whitespace for what should be used within a line.
Note, I am leaving out support for Unicde Default Ignorable characters.
I figure that can be discussed outside of this change within the
reference and tracking issue.
`rc""` more clear error message
here is small fix that provides better error message when user is trying to use `rc""` the same way it was made for `rb""`
example of it's work
```rust
|
2 | rc"\n";
| ^^ unknown prefix
|
= note: prefixed identifiers and literals are reserved since Rust 2021
help: use `cr` for a raw C-string
|
2 - rc"\n";
2 + cr"\n";
|
```
**related issue**
fixes#140170
cc `@cyrgani` (issue author)
By replacing them with `{Open,Close}{Param,Brace,Bracket,Invisible}`.
PR #137902 made `ast::TokenKind` more like `lexer::TokenKind` by
replacing the compound `BinOp{,Eq}(BinOpToken)` variants with fieldless
variants `Plus`, `Minus`, `Star`, etc. This commit does a similar thing
with delimiters. It also makes `ast::TokenKind` more similar to
`parser::TokenType`.
This requires a few new methods:
- `TokenKind::is_{,open_,close_}delim()` replace various kinds of
pattern matches.
- `Delimiter::as_{open,close}_token_kind` are used to convert
`Delimiter` values to `TokenKind`.
Despite these additions, it's a net reduction in lines of code. This is
because e.g. `token::OpenParen` is so much shorter than
`token::OpenDelim(Delimiter::Parenthesis)` that many multi-line forms
reduce to single line forms. And many places where the number of lines
doesn't change are still easier to read, just because the names are
shorter, e.g.:
```
- } else if self.token != token::CloseDelim(Delimiter::Brace) {
+ } else if self.token != token::CloseBrace {
```
For consistency with `rustc_lexer::TokenKind::Bang`, and because other
`ast::TokenKind` variants generally have syntactic names instead of
semantic names (e.g. `Star` and `DotDot` instead of `Mul` and `Range`).
`BinOpToken` is badly named, because it only covers the assignable
binary ops and excludes comparisons and `&&`/`||`. Its use in
`ast::TokenKind` does allow a small amount of code sharing, but it's a
clumsy factoring.
This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one
with 10 individual variants. This makes `ast::TokenKind` more similar to
`rustc_lexer::TokenKind`, which has individual variants for all
operators.
Although the number of lines of code increases, the number of chars
decreases due to the frequent use of shorter names like `token::Plus`
instead of `token::BinOp(BinOpToken::Plus)`.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Lexing precedes parsing, as you'd expect: `Lexer` creates a
`TokenStream` and `Parser` then parses that `TokenStream`.
But, in a horrendous violation of layering abstractions and common
sense, `Lexer` depends on `Parser`! The `Lexer::unclosed_delim_err`
method does some error recovery that relies on creating a `Parser` to do
some post-processing of the `TokenStream` that the `Lexer` just created.
This commit just removes `unclosed_delim_err`. This change removes
`Lexer`'s dependency on `Parser`, and also means that `lex_token_tree`'s
return value can have a more typical form.
The cost is slightly worse error messages in two obscure cases, as shown
in these tests:
- tests/ui/parser/brace-in-let-chain.rs: there is slightly less
explanation in this case involving an extra `{`.
- tests/ui/parser/diff-markers/unclosed-delims{,-in-macro}.rs: the diff
marker detection is no longer supported (because that detection is
implemented in the parser).
In my opinion this cost is outweighed by the magnitude of the code
cleanup.
There is a not-very-useful layering in the lexer, where
`TokenTreesReader` contains a `StringReader`. This commit combines them
and names the result `Lexer`, which is a more obvious name for it.
The methods of `Lexer` are now split across `mod.rs` and `tokentrees.rs`
which isn't ideal, but it doesn't seem worth moving a bunch of code to
avoid it.
Current places where `Interpolated` is used are going to change to
instead use invisible delimiters. This prepares for that.
- It adds invisible delimiter cases to the `can_begin_*`/`may_be_*`
methods and the `failed_to_match_macro` that are equivalent to the
existing `Interpolated` cases.
- It adds panics/asserts in some places where invisible delimiters
should never occur.
- In `Parser::parse_struct_fields` it excludes an ident + invisible
delimiter from special consideration in an error message, because
that's quite different to an ident + paren/brace/bracket.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
Add limit for unclosed delimiters in lexer diagnostic
Fixes#127868
The first commit shows the original diagnostic, and the second commit shows the changes.