Git only assumes a submodule URL is a relative path if it starts with `./`
or `../` [^1]. To fetch the correct repo, we need to construct an aboslute
submodule URL.
At this moment it comes with some limitations:
* GitHub doesn't accept non-normalized URLs wth relative paths.
(`ssh://git@github.com/rust-lang/cargo.git/relative/..` is invalid)
* `url` crate cannot parse SCP-like URLs.
(`git@github.com:rust-lang/cargo.git` is not a valid WHATWG URL)
To overcome these, this patch always tries `Url::parse` first to normalize
the path. If it couldn't, append the relative path as the last resort and
pray the remote git service supports non-normalized URLs.
See also rust-lang/cargo#12404 and rust-lang/cargo#12295.
[^1]: <https://git-scm.com/docs/git-submodule>
`parent_remote_url` used to be `&str` before #12244. However, we changed
the type to `Url` and it started failing to parse scp-like URLs since
they are not compliant with WHATWG URL spec.
In this commit, we change it back to `&str` and construct the URL
manually. This should be safe since Cargo already checks if it is a
relative URL for that if branch.
Doing so seems cleaner as there should be no logical difference between
shallow or not-shallow when fetching. We want a specific object, and should
get it with the refspec. `git` will assure we see all objects we need,
handling shallow-ness for us.
Note that one test needed adjustments due to the different mechanism used
when fetching local repositories, requiring more changes to properly 'break'
the submodule repo when `gitoxide` is used.
This is an improvement over the previous version which would use unshallowing that effectively
makes a shallow repo *not* shallow.
Furthermore, we will now only fetch a single commit, each time we fetch, which should be faster
for the server as well as for the client.
We also make it possible to fetch individual commits that would be specified via Cargo.lock.
A couple of test expectations are adjusted accordingly.
Is this desirable behaviour? Unfortunately, there is no alternative
as adding shallow to an existing index most definitely breaks backwards
compatibility.
The implementation hinges on passing information about the kind of clone
and fetch to the `fetch()` method, which then configures the fetch accordingly.
Note that it doesn't differentiate between initial clones and fetches as
the shallow-ness of the repository is maintained nonetheless.
This allows to use `gitoxide` for all fetch operations, boosting performance
for fetching the `crates.io` index by a factor of 2.2x, while being consistent
on all platforms.
For trying it, nightly builds of `cargo` can specify `-Zgitoxide=fetch`.
It's also possible to set the `__CARGO_USE_GITOXIDE_INSTEAD_OF_GIT2=1` environment
variable (value matters), which is when `-Zgitoxide=none` can be used
to use `git2` instead.
Limitations
-----------
Note that what follows are current shortcomings that will be addressed in future PRs.
- it's likely that authentication around the `ssh` protocol will work differently in practice
as it uses the `ssh` program.
- clones from `file://` based crates indices will need the `git` binary to serve the locatl repository.
- the progress bar shown when fetching doesn't work like the orgiinal, but should already feel 'faster'.
Git lets users define the default update/checkout strategy for a submodule
by setting the `submodule.<name>.update` key in `.gitmodules` file.
If the update strategy is `none`, the submodule will be skipped during
update. It will not be fetched and checked out:
1. *foo* is a big git repo
```
/tmp $ git init foo
Initialized empty Git repository in /tmp/foo/.git/
/tmp $ dd if=/dev/zero of=foo/big bs=1000M count=1
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.482087 s, 2.2 GB/s
/tmp $ git -C foo add big
/tmp $ git -C foo commit -m 'I am big'
[main (root-commit) 84fb533] I am big
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 big
```
2. *bar* is a repo with a big submodule with `update=none`
```
/tmp $ git init bar
Initialized empty Git repository in /tmp/bar/.git/
/tmp $ git -C bar submodule add file:///tmp/foo foo
Cloning into '/tmp/bar/foo'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 1 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), 995.50 KiB | 338.00 KiB/s, done.
/tmp $ git -C bar config --file .gitmodules submodule.foo.update none
/tmp $ cat bar/.gitmodules
[submodule "foo"]
path = foo
url = file:///tmp/foo
update = none
/tmp $ git -C bar commit --all -m 'I have a big submodule with update=none'
[main (root-commit) 6c355ea] I have a big submodule not updated by default
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 foo
```
3. *baz* is a clone of *bar*, notice *foo* submodule gets skipped
```
/tmp $ git clone --recurse-submodules file:///tmp/bar baz
Cloning into 'baz'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.
Submodule 'foo' (file:///tmp/foo) registered for path 'foo'
Skipping submodule 'foo'
/tmp $ git -C baz submodule update --init
Skipping submodule 'foo'
/tmp $
```
Cargo, on the other hand, ignores the submodule update strategy set in
`.gitmodules` properties when updating dependencies. Such behavior can
be considered against the wish of the crate publisher.
4. *bar* is now a lib with a big submodule with update disabled
```
/tmp $ cargo init --lib bar
Created library package
/tmp $ git -C bar add .
/tmp $ git -C bar commit -m 'I am a lib with a big submodule but update=none'
[main eb07cf7] I am a lib with a big submodule but update=none
3 files changed, 18 insertions(+)
create mode 100644 .gitignore
create mode 100644 Cargo.toml
create mode 100644 src/lib.rs
/tmp $
```
5. *qux* depends on *bar*, notice *bar*'s submodules are fetched
```
/tmp $ cargo init qux && cd qux
Created binary (application) package
/tmp/qux $ echo -e '[dependencies.bar]\ngit = "file:///tmp/bar"' >> Cargo.toml
/tmp/qux $ time cargo update
Updating git repository `file:///tmp/bar`
Updating git submodule `file:///tmp/foo`
real 0m22.182s
user 0m20.402s
sys 0m1.714s
/tmp/qux $
```
Fix it by checking if a Git repository submodule should be updated when
cargo processes dependencies.
6. With the change applied, submodules with `update=none` are skipped
```
/tmp/qux $ cargo cache -a > /dev/null
/tmp/qux $ time ~/src/cargo/target/debug/cargo update
Updating git repository `file:///tmp/bar`
Skipping git submodule `file:///tmp/foo`
real 0m0.029s
user 0m0.021s
sys 0m0.008s
/tmp/qux $
```
Fixes#4247.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
In addition to `foo:1.2.3`, we now support `foo@1.2.3` for pkgids. We
are also making it the default way of rendering pkgid's for the user.
With cargo-add in #10472, we've decided to only use `@` in it and to add
it as an alternative to `:` in the rest of cargo. `cargo-add`
originally used `@`. When preparing it for merge, I switched to `:` to
be consistent with pkgids. When discussing this, it was felt `@` has
precedence in too many tools to switch to `:` but that we should instead
switch pkgid's to use `@`, in a backwards compatible way.
See also https://internals.rust-lang.org/t/feedback-on-cargo-add-before-its-merged/16024/26?u=epage
Benefits:
- A TOML 1.0 compliant parser
- Unblock future work
- Have `cargo init` add the current crate to the workspace, rather
than error
- #5586: Upstream `cargo-add`
After the rust_version field was stabilized in #9732 this adds the
rust_version as output to the `cargo metadata` command, so tools like
Clippy can read and use it as well.
This commit follows through with work started in #8522 to change the
default behavior of `git` dependencies where if not branch/tag/etc is
listed then `HEAD` is used instead of the `master` branch. This involves
also changing the default lock file format, now including a `version`
marker at the top of the file notably as well as changing the encoding
of `branch=master` directives in `Cargo.toml`.
If we did all our work correctly then this will be a seamless change.
First released on stable in 1.47.0 (2020-10-08) Cargo has been emitting
warnings about situations which may break in the future. This means that
if you don't specify `branch = 'master'` but your HEAD branch isn't
`master`, you've been getting a warning. Similarly if your dependency
graph used both `branch = 'master'` as well as specifying nothing, you
were receiving warnings as well. These two situations are broken by this
commit, but it's hoped that by giving enough times with warnings we
don't actually break anyone in practice.
This commit implements a simple warning to indicate when `DefaultBranch`
is unified with `Branch("master")`, meaning `Cargo.toml` inconsistently
lists `branch = "master"` and not. The intention here is to ensure that
all projects uniformly use one or the other to ensure we can smoothly
transition to the new lock file format.
This commit implements a warning in Cargo for when a dependency
directive is using `DefaultBranch` but the default branch of the remote
repository is not actually `master`. We will eventually break this
dependency directive and the warning indicates the fix, which is to
write down `branch = "master"`.
This commit is intended to be an effective but not literal revert
of #8364. Internally Cargo will still distinguish between
`DefaultBranch` and `Branch("master")` when reading `Cargo.toml` files,
but for almost all purposes the two are equivalent. This will namely fix
the issue we have with lock file encodings where both are encoded with
no `branch` (and without a branch it's parsed from a lock file as
`DefaultBranch`).
This will preserve the change that `cargo vendor` will not print out
`branch = "master"` annotations but that desugars to match the lock file
on the other end, so it should continue to work.
Tests have been added in this commit for the regressions found on #8468.
This commit improves Cargo's support for git repositories whose "main
branch" is not called `master`. Cargo currently pretty liberally assumes
that if nothing else about a git repository is specified then `master`
is the branch name to use. Instead now Cargo has a fourth option as the
desired reference of a repository named `DefaultBranch`. Cargo doesn't
know anything about the actual name of the default branch, it just
updates how git references are fetched internally.
This commit is motivated by news that GitHub is likely to switch away
from the default branch being named `master` in the near future. It
would be a bit of a bummer if from now on everyone had to type
`branch = '...'`, so this tries to improve that!
This commit refactors various logic of the git source internals to
ensure that if we have a locked revision that we plumb the desired
branch/tag all the way through to the `fetch`. Previously we'd switch to
`Rev` very early on, but the fetching logic for `Rev` is very eager and
fetches too much, so instead we only resolve the locked revision later
on.
Internally this does some various refactoring to try to make various
bits and pieces of logic a bit easyer to grok, although it's still
perhaps not the cleanest implementation.