This commit fixes a nasty bug where the root path given to walkdir was
always reported as a symlink, even when 'follow_links' was enabled. This
appears to be a regression introduced by commit 6f72fce as part of
fixing BurntSushi/ripgrep#984.
The central problem was that since root paths should always be followed,
we were creating a DirEntry whose internal file type was always resolved
by following a symlink, but whose 'metadata' method still returned the
metadata of the symlink and not the target. This was problematic and
inconsistent both with and without 'follow_links' enabled.
We also fix the documentation. In particular, we make the docs of 'new'
more unambiguous, where it previously could have been interpreted as
contradictory to the docs on 'DirEntry'. Specifically, 'WalkDir::new'
says:
If root is a symlink, then it is always followed.
But the docs for 'DirEntry::metadata' say
This always calls std::fs::symlink_metadata.
If this entry is a symbolic link and follow_links is enabled, then
std::fs::metadata is called instead.
Similarly, 'DirEntry::file_type' said
If this is a symbolic link and follow_links is true, then this
returns the type of the target.
That is, if 'root' is a symlink and 'follow_links' is NOT enabled,
then the previous incorrect behavior resulted in 'DirEntry::file_type'
behaving as if 'follow_links' was enabled. If 'follow_links'
was enabled, then the previous incorrect behavior resulted in
'DirEntry::metadata' reporting the metadata of the symlink itself.
We fix this by correctly constructing the DirEntry in the first place,
and then adding special case logic to path traversal that will always
attempt to follow the root path if it's a symlink and 'follow_links'
was not enabled. We also tweak the docs on 'WalkDir::new' to be more
precise.
Fixes#115
This commit includes a new method, `same_file_system`, which when
enabled, will cause walkdir to only descend into directories that are on
the same file system as the root path.
Closes#8, Closes#107
This commit fixes a performance regression introduced in commit 0f4441,
which aimed to fix OneDrive traversals. In particular, we added an
additional stat call to every directory entry, which can be quite
disastrous for performance. We fix this by being more fastidious about
reusing the Metadata that comes from fs::DirEntry, which is, conveniently,
cheap to acquire specifically on Windows.
The performance regression was reported against ripgrep:
https://github.com/BurntSushi/ripgrep/issues/820
In some cases, we were relying on things like "not(unix)" to mean "windows"
or "not(windows)" to mean "unix". Instead, we should split this in three
cases: unix, windows or not(unix or windows).
A big potential question on the reader's mind when reviewing these
docs is "what will the paths returned by the iterator be relative
to?" This is the one example on the page which shows output that
could potentially answer that question, and to only see filenames is
needlessly discouraging.
This fixes a bug where a symlink was followed even if the user did not
request it. Namely, on Windows, a symlink can be interpreted as both a
symlink and a directory, given our new is_dir checking.
This commit fixes a bug on Windows where walkdir refused to traverse
directories that resided on OneDrive via its "file on demand" strategy.
The specific bug is that Rust's standard library treats a reparse point
(which is what OneDrive uses) as distinct from a file or directory, which
wreaks havoc on any code that uses FileType::{is_file, is_dir}. We fix
this by checking the directory status of a file by looking only at whether
its directory bit is set.
This bug was originally reported in ripgrep:
https://github.com/BurntSushi/ripgrep/issues/705
It has also been filed upstream:
https://github.com/rust-lang/rust/issues/46484
And has a pending fix:
https://github.com/rust-lang/rust/pull/47956
This fixes a bug in walkdir that happened on Windows when following
symlinks. It was triggered when opening a handle to the symlink failed.
In particular, this resulted in the two stacks in the walkdir iterator
getting out of sync. At some point, this tripped a panic when popping
from one stack would be fine but popping from the other failed because
it was empty.
We fix this by only pushing to both stacks if and only if both pushes
would succeed.
This bug was found via ripgrep. See:
https://github.com/BurntSushi/ripgrep/issues/633#issuecomment-339076246
This commit tweaks the `From<walkdir::Error> for io::Error`
implementation to always retain the current context when
constructing the `io::Error`. This differs from the previous
implementation in that the original raw I/O error is no longer
returned.
To compensate, a new method, `into_io_error`, has been
added which returns the original I/O error, if one exists.
We do not consider this a breaking change because the
documentation for the `From` impl always stated that it
existed for ergonomic reasons. Arguably, the implementation
in this commit is a more faithful reflection of that
documentation.
This commit also clears up the public documentation
surrounding the aforementioned methods.
Doc strings on public items should always start with a short one
sentence description. This is for readability purposes, and also to make
the display reasonable in rustdoc.
Broadly speaking, this commit is an attempt to fix this issue:
https://github.com/BurntSushi/ripgrep/issues/633
It was reported that symlink checking was taking a long amount of time,
and that one possible way to fix this was to reduce number of times a
file descriptor is opened. In this commit, we amortize opening file
descriptors by keeping a file handle open for each ancestor in the
directory tree. We also open a handle for the candidate file path at
most once, instead of once every iteration.
Note that we only perform this optimization on Windows, where opening a
file handle seems inordinately expensive. In particular, this now causes
us to potentially open more file descriptors than the limit set by the
user, which only happens when following symbolic links. We document this
behavior.