83 Commits

Author SHA1 Message Date
Alex
2d18c886f5 Fix a crash/mislex when more than one frontmatter closing possibility is considered 2025-09-22 15:10:41 -04:00
Ed Page
6f0da976c5 fix(lexer): Only allow horizontal whitespace in frontmatter
In writing up the reference for frontmatter, I realized that we probably
shouldn't be accepting Unicode Line Ending characters between the code
fence and infostring or trailing after the infostring or a code fence.

In digging into the unicode specification we use for Whitespace, it
divides it up into categories, so I'm deferring to what it says for
horizontal whitespace for what should be used within a line.

Note, I am leaving out support for Unicde Default Ignorable characters.
I figure that can be discussed outside of this change within the
reference and tracking issue.
2025-09-01 20:51:39 -05:00
Ed Page
428e413414 docs(lexer): Organize and document whitespace by Pattern_White_Space 2025-09-01 20:51:39 -05:00
Ed Page
142e25e356 fix(lexer): Don't require frontmatters to be escaped with indented fences
The RFC only limits hyphens at the beginning of lines and not if they
are indented or embedded in other content.

Sticking to that approach was confirmed by the T-lang liason at
https://github.com/rust-lang/rust/issues/141367#issuecomment-3202217544

There is a regression in error message quality which I'm leaving for
someone if they feel this needs improving.
2025-08-28 14:08:33 -05:00
Ed Page
f43f974b9e fix(lexer): Allow '-' in the infostring continue set
This more closely matches the RFC and what our T-lang contact has asked
for, see https://github.com/rust-lang/rust/issues/136889#issuecomment-3212715312
2025-08-22 09:26:19 -05:00
Ed Page
45a1e492b1 feat(lexer): Allow including frontmatter with 'tokenize' 2025-07-09 16:42:27 -05:00
klensy
c76d032f01 setup CI and tidy to use typos for spellchecking and fix few typos 2025-07-03 10:51:06 +03:00
Marijn Schouten
2a5225a369 rustc_lexer: typo fix + small cleanups 2025-06-06 13:08:16 +00:00
Matthew Jasper
55f59fb0e3 Fix parsing of frontmatters with inner hyphens 2025-06-04 15:51:36 +00:00
Deadbeef
662182637e Implement RFC 3503: frontmatters
Supercedes #137193
2025-05-05 23:10:08 +08:00
Guillaume Gomez
aff2bc7a88 Replace rustc_lexer/unescape with rustc-literal-escaper crate 2025-04-04 14:44:45 +02:00
Ralf Jung
20d04d8a40 Revert "Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu"
This reverts commit 08dfbf49e30d917c89e49eb14cb3f1e8b8a1c9ef, reversing
changes made to 10bcdad7df0de3cfb95c7bdb7b16908e73cafc09.
2025-03-18 13:28:56 +01:00
Jacob Pratt
08dfbf49e3
Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu
Add `*_value` methods to proc_macro lib

This is the implementation of https://github.com/rust-lang/libs-team/issues/459.

It allows to get the actual value (unescaped) of the different string literals.

Part of https://github.com/rust-lang/rust/issues/136652.

r? libs-api
2025-03-17 05:47:48 -04:00
Nicholas Nethercote
ff0a5fe975 Remove #![warn(unreachable_pub)] from all compiler/ crates.
It's no longer necessary now that `-Wunreachable_pub` is being passed.
2025-03-11 13:14:21 +11:00
许杰友 Jieyou Xu (Joe)
063ef18fdc Revert "Use workspace lints for crates in compiler/ #138084"
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).

This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.

This reverts commit 48caf81484b50dca5a5cebb614899a3df81ca898, reversing
changes made to c6662879b27f5161e95f39395e3c9513a7b97028.
2025-03-10 18:12:47 +08:00
Nicholas Nethercote
8a3e03392e Remove #![warn(unreachable_pub)] from all compiler/ crates.
(Except for `rustc_codegen_cranelift`.)

It's no longer necessary now that `unreachable_pub` is in the workspace
lints.
2025-03-08 08:41:43 +11:00
Michael Goulet
12e3911d81 Greatly simplify lifetime captures in edition 2024 2025-02-22 22:24:52 +00:00
Guillaume Gomez
94f0f2b603 Reexport literal-escaper from rustc_lexer to allow rust-analyzer to compile 2025-02-10 10:38:22 +01:00
Guillaume Gomez
49d2d5a116 Extract unescape from rustc_lexer into its own crate 2025-02-10 10:38:22 +01:00
gvozdvmozgu
9f469eb600 implement eat_until leveraging memchr in lexer 2025-02-05 07:03:53 -08:00
Eric Huss
a97404eee3 Add test to check unicode identifier version 2024-12-09 06:23:59 -08:00
Michael Goulet
b87e935407 Revert "Reject raw lifetime followed by \' as well"
This reverts commit 1990f1560801ca3f9e6a3286e58204aa329ee037.
2024-12-01 05:22:16 +00:00
Nicholas Nethercote
4cd2840f00 Clean up c_or_byte_string.
- Rename a misleading local `mk_kind` as `single_quoted`.
- Use `fn` for all three arguments, for consistency.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
e9a0c3c98c Remove TokenKind::InvalidPrefix.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
2024-11-19 18:06:22 +11:00
Nicholas Nethercote
2c7c3697db Improve TokenKind comments.
- Improve wording.
- Use backticks consistently for examples.
2024-11-19 18:04:01 +11:00
Nicholas Nethercote
df29f9b0c3 Improve fake_ident_or_unknown_prefix.
- Rename it as `invalid_ident_or_prefix`, which matches the possible
  outputs (`InvalidIdent` or `InvalidPrefix`).
- Use the local wrapper for `is_xid_continue`, for consistency.
- Make it clear what `\u{200d}` means.
2024-11-19 18:01:43 +11:00
Michael Goulet
1990f15608 Reject raw lifetime followed by \' as well 2024-10-30 01:13:18 +00:00
Peter Jaszkowiak
321a5db7d4 Reserve guarded string literals (RFC 3593) 2024-10-08 18:21:16 -06:00
Michael Goulet
97910580aa Add initial support for raw lifetimes 2024-09-06 10:32:48 -04:00
Michael Goulet
3b3e43a386 Format lexer 2024-09-06 10:32:48 -04:00
Michael Goulet
9aaf873396 Reserve prefix lifetimes too 2024-09-06 10:32:48 -04:00
Nicholas Nethercote
6c84c55c9f Add warn(unreachable_pub) to rustc_lexer. 2024-08-27 15:12:46 +10:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Nicholas Nethercote
75b164d836 Use tidy to sort crate attributes for all compiler crates.
We already do this for a number of crates, e.g. `rustc_middle`,
`rustc_span`, `rustc_metadata`, `rustc_span`, `rustc_errors`.

For the ones we don't, in many cases the attributes are a mess.
- There is no consistency about order of attribute kinds (e.g.
  `allow`/`deny`/`feature`).
- Within attribute kind groups (e.g. the `feature` attributes),
  sometimes the order is alphabetical, and sometimes there is no
  particular order.
- Sometimes the attributes of a particular kind aren't even grouped
  all together, e.g. there might be a `feature`, then an `allow`, then
  another `feature`.

This commit extends the existing sorting to all compiler crates,
increasing consistency. If any new attribute line is added there is now
only one place it can go -- no need for arbitrary decisions.

Exceptions:
- `rustc_log`, `rustc_next_trait_solver` and `rustc_type_ir_macros`,
  because they have no crate attributes.
- `rustc_codegen_gcc`, because it's quasi-external to rustc (e.g. it's
  ignored in `rustfmt.toml`).
2024-06-12 15:49:10 +10:00
Michael Scholten
3c5e88c7d1 Improved the compiler code with clippy 2024-04-24 09:41:44 +02:00
Esteban Küber
19821ad234 Properly handle emojis as literal prefix in macros
Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes.

Fix #123696.
2024-04-10 23:19:27 +00:00
Nicholas Nethercote
0ac1195ee0 Invert diagnostic lints.
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.

This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
2024-02-06 13:12:33 +11:00
León Orell Valerian Liehr
c0a9f722c4
Undeprecate and use lint unstable_features 2023-12-20 18:16:28 +01:00
Charles Lew
bca79a26d8 Update lexer emoji diagnostics to Unicode 15.0 2023-07-29 08:47:21 +08:00
Deadbeef
df9bd80d74 reimplement C string literals 2023-07-23 06:54:07 +00:00
León Orell Valerian Liehr
c6643b50ea
Revert the lexing of c_str_literals 2023-07-05 13:11:17 +02:00
Nicholas Nethercote
e52794decd Don't try to eat non-existent decimal digits.
After seeing a `0`, if it's followed by any of `[0-9]`, `_`, `.`, `e`,
or `E`, we consume all the digits. But in the `.`, `e` and `E` cases
this is pointless because we know there aren't any digits.
2023-05-15 18:33:12 +10:00
Nicholas Nethercote
19967c5890 Make Cursor::number less DRY.
A tiny bit of repetition makes this easier to read, and avoids a test on
the "Not a base prefix" match arm.
2023-05-15 18:30:26 +10:00
Deadbeef
78e3455d37 address comments 2023-05-02 10:32:07 +00:00
Deadbeef
a49570fd20 fix TODO comments 2023-05-02 10:32:07 +00:00
Deadbeef
8ff3903643 initial step towards implementing C string literals 2023-05-02 10:30:09 +00:00
Michael Goulet
a047064d6b Revert "Don't recover lifetimes/labels containing emojis as character literals"
Reverts PR #108031
Fixes (doesnt close until beta backported) #109746

This reverts commit e3f9db5fc319c6d8eee5d47d216ea6a426070c41.
This reverts commit 98b82aedba3f3f581e89df54352914b27f42c6f7.
This reverts commit 380fa264132ad481e73cbbf0f3a0feefd99a1d78.
2023-04-10 06:52:41 +00:00
est31
5a02105fff Rustdoc-ify LiteralKind note 2023-03-03 08:39:36 +01:00
许杰友 Jieyou Xu (Joe)
380fa26413
Don't recover lifetimes/labels containing emojis as character literals
Note that at the time of this commit, `unic-emoji-char` seems to have
data tables only up to Unicode 5.0, but Unicode is already newer than
this.

A newer emoji such as `🥺` will not be recognized as an emoji
but older emojis such as `🐱` will.
2023-02-14 17:31:58 +08:00
Maybe Waffle
6a28fb42a8 Remove double spaces after dots in comments 2023-01-17 08:09:33 +00:00