simplify SourceID Hash (#14800)

### What does this PR try to resolve?

Despite being interned `SourceId::Eq` is not a `ptr::eq`. Which in turn
is because `SourceId`s concept of identity is a complete mess. The code
relies on having to IDs that are `Eq` but do not have the same values
for their fields.

As one measure of this `SourceId` has an `impl Hash` which does
something different from `fn full_hash` and `fn stable_hash`. Separately
`SourceIdInner` has a different implementation. Similar levels of
complexity exist for `Eq`. Every one of these `impl`s was added due to a
real bug/issue we've had that needs to stay fixed. Not all of witch are
reproducible enough to have made it into our test suite.

I [have some
ideas](https://github.com/rust-lang/cargo/pull/14665#issuecomment-2412077472)
for how to reorganize the types so that this is easier to reason about
and faster. But given the history and the complexity I want to move
extremely carefully.

### How should we test and review this PR?

The test pass, and it's a one line change, but this still needs careful
review.

### Additional information

r? @ehuss I remember you and Alex working very hard to track down most
of these bugs.
This commit is contained in:
Weihang Lo 2024-12-23 18:36:17 +00:00 committed by GitHub
commit 0c157e0c48
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -532,7 +532,10 @@ impl SourceId {
url == CRATES_IO_INDEX || url == CRATES_IO_HTTP_INDEX || is_overridden_crates_io_url(url)
}
/// Hashes `self`.
/// Hashes `self` to be used in the name of some Cargo folders, so shouldn't vary.
///
/// For git and url, `as_str` gives the serialisation of a url (which has a spec) and so
/// insulates against possible changes in how the url crate does hashing.
///
/// For paths, remove the workspace prefix so the same source will give the
/// same hash in different locations, helping reproducible builds.
@ -550,7 +553,11 @@ impl SourceId {
return;
}
}
self.hash(into)
self.inner.kind.hash(into);
match self.inner.kind {
SourceKind::Git(_) => (&self).inner.canonical_url.hash(into),
_ => (&self).inner.url.as_str().hash(into),
}
}
pub fn full_eq(self, other: SourceId) -> bool {
@ -665,16 +672,10 @@ impl fmt::Display for SourceId {
}
}
/// The hash of `SourceId` is used in the name of some Cargo folders, so shouldn't
/// vary. `as_str` gives the serialisation of a url (which has a spec) and so
/// insulates against possible changes in how the url crate does hashing.
impl Hash for SourceId {
fn hash<S: hash::Hasher>(&self, into: &mut S) {
self.inner.kind.hash(into);
match self.inner.kind {
SourceKind::Git(_) => self.inner.canonical_url.hash(into),
_ => self.inner.url.as_str().hash(into),
}
self.inner.canonical_url.hash(into);
}
}