Andrew Gallant 83b7a24302
walkdir: fix root symlink bug
This commit fixes a nasty bug where the root path given to walkdir was
always reported as a symlink, even when 'follow_links' was enabled. This
appears to be a regression introduced by commit 6f72fce as part of
fixing BurntSushi/ripgrep#984.

The central problem was that since root paths should always be followed,
we were creating a DirEntry whose internal file type was always resolved
by following a symlink, but whose 'metadata' method still returned the
metadata of the symlink and not the target. This was problematic and
inconsistent both with and without 'follow_links' enabled.

We also fix the documentation. In particular, we make the docs of 'new'
more unambiguous, where it previously could have been interpreted as
contradictory to the docs on 'DirEntry'. Specifically, 'WalkDir::new'
says:

    If root is a symlink, then it is always followed.

But the docs for 'DirEntry::metadata' say

    This always calls std::fs::symlink_metadata.

    If this entry is a symbolic link and follow_links is enabled, then
    std::fs::metadata is called instead.

Similarly, 'DirEntry::file_type' said

    If this is a symbolic link and follow_links is true, then this
    returns the type of the target.

That is, if 'root' is a symlink and 'follow_links' is NOT enabled,
then the previous incorrect behavior resulted in 'DirEntry::file_type'
behaving as if 'follow_links' was enabled. If 'follow_links'
was enabled, then the previous incorrect behavior resulted in
'DirEntry::metadata' reporting the metadata of the symlink itself.

We fix this by correctly constructing the DirEntry in the first place,
and then adding special case logic to path traversal that will always
attempt to follow the root path if it's a symlink and 'follow_links'
was not enabled. We also tweak the docs on 'WalkDir::new' to be more
precise.

Fixes #115
2018-11-11 09:32:29 -05:00
2018-11-07 07:17:55 -05:00
2018-11-11 09:32:29 -05:00
2018-02-01 22:49:07 -05:00
2018-08-25 10:46:33 -04:00
2018-08-24 18:55:24 -04:00
2018-10-29 06:59:16 -04:00
2015-09-17 18:40:28 -04:00
2015-09-17 18:40:28 -04:00
2015-09-17 18:40:28 -04:00

walkdir

A cross platform Rust library for efficiently walking a directory recursively. Comes with support for following symbolic links, controlling the number of open file descriptors and efficient mechanisms for pruning the entries in the directory tree.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

docs.rs/walkdir

Usage

To use this crate, add walkdir as a dependency to your project's Cargo.toml:

[dependencies]
walkdir = "2"

Example

The following code recursively iterates over the directory given and prints the path for each entry:

use walkdir::WalkDir;

for entry in WalkDir::new("foo") {
    let entry = entry.unwrap();
    println!("{}", entry.path().display());
}

Or, if you'd like to iterate over all entries and ignore any errors that may arise, use filter_map. (e.g., This code below will silently skip directories that the owner of the running process does not have permission to access.)

use walkdir::WalkDir;

for entry in WalkDir::new("foo").into_iter().filter_map(|e| e.ok()) {
    println!("{}", entry.path().display());
}

The same code as above, except follow_links is enabled:

use walkdir::WalkDir;

for entry in WalkDir::new("foo").follow_links(true) {
    let entry = entry.unwrap();
    println!("{}", entry.path().display());
}

Example: skip hidden files and directories efficiently on unix

This uses the filter_entry iterator adapter to avoid yielding hidden files and directories efficiently:

use walkdir::{DirEntry, WalkDir};

fn is_hidden(entry: &DirEntry) -> bool {
    entry.file_name()
         .to_str()
         .map(|s| s.starts_with("."))
         .unwrap_or(false)
}

let walker = WalkDir::new("foo").into_iter();
for entry in walker.filter_entry(|e| !is_hidden(e)) {
    let entry = entry.unwrap();
    println!("{}", entry.path().display());
}

Motivation

std::fs has an unstable walk_dir implementation that needed some design work. I started off on that task, but it quickly became apparent that walking a directory recursively is quite complex and may not be a good fit for std right away.

This should at least resolve most or all of the issues reported here (and then some):

Performance

The short story is that performance is comparable with find and glibc's nftw on both a warm and cold file cache. In fact, I cannot observe any performance difference after running find /, walkdir / and nftw / on my local file system (SSD, ~3 million entries). More precisely, I am reasonably confident that this crate makes as few system calls and close to as few allocations as possible.

I haven't recorded any benchmarks, but here are some things you can try with a local checkout of walkdir:

# The directory you want to recursively walk:
DIR=$HOME

# If you want to observe perf on a cold file cache, run this before *each*
# command:
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

# To warm the caches
find $DIR

# Test speed of `find` on warm cache:
time find $DIR

# Compile and test speed of `walkdir` crate:
cargo build --release --example walkdir
time ./target/release/examples/walkdir $DIR

# Compile and test speed of glibc's `nftw`:
gcc -O3 -o nftw ./compare/nftw.c
time ./nftw $DIR

# For shits and giggles, test speed of Python's (2 or 3) os.walk:
time python ./compare/walk.py $DIR

On my system, the performance of walkdir, find and nftw is comparable.

Description
Rust library for walking directories recursively.
Readme 701 KiB
Languages
Rust 99.3%
C 0.5%
Python 0.2%