Andrew Gallant e2b898329f
traversal: more robust error handling
This fixes a bug in walkdir that happened on Windows when following
symlinks. It was triggered when opening a handle to the symlink failed.
In particular, this resulted in the two stacks in the walkdir iterator
getting out of sync. At some point, this tripped a panic when popping
from one stack would be fine but popping from the other failed because
it was empty.

We fix this by only pushing to both stacks if and only if both pushes
would succeed.

This bug was found via ripgrep. See:
https://github.com/BurntSushi/ripgrep/issues/633#issuecomment-339076246
2018-02-01 17:01:08 -05:00
2018-02-01 17:01:08 -05:00
2015-09-17 18:40:28 -04:00
2017-10-21 08:10:18 -04:00
2017-01-08 18:03:20 -05:00
2018-01-30 20:25:20 -05:00
2015-09-17 18:40:28 -04:00
2015-09-17 18:40:28 -04:00
2017-10-21 08:10:18 -04:00
2015-09-17 18:40:28 -04:00

walkdir

A cross platform Rust library for efficiently walking a directory recursively. Comes with support for following symbolic links, controlling the number of open file descriptors and efficient mechanisms for pruning the entries in the directory tree.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

docs.rs/walkdir

Usage

To use this crate, add walkdir as a dependency to your project's Cargo.toml:

[dependencies]
walkdir = "2"

Example

The following code recursively iterates over the directory given and prints the path for each entry:

use walkdir::WalkDir;

for entry in WalkDir::new("foo") {
    let entry = entry.unwrap();
    println!("{}", entry.path().display());
}

Or, if you'd like to iterate over all entries and ignore any errors that may arise, use filter_map. (e.g., This code below will silently skip directories that the owner of the running process does not have permission to access.)

use walkdir::WalkDir;

for entry in WalkDir::new("foo").into_iter().filter_map(|e| e.ok()) {
    println!("{}", entry.path().display());
}

The same code as above, except follow_links is enabled:

use walkdir::WalkDir;

for entry in WalkDir::new("foo").follow_links(true) {
    let entry = entry.unwrap();
    println!("{}", entry.path().display());
}

Example: skip hidden files and directories efficiently on unix

This uses the filter_entry iterator adapter to avoid yielding hidden files and directories efficiently:

use walkdir::{DirEntry, WalkDir};

fn is_hidden(entry: &DirEntry) -> bool {
    entry.file_name()
         .to_str()
         .map(|s| s.starts_with("."))
         .unwrap_or(false)
}

let walker = WalkDir::new("foo").into_iter();
for entry in walker.filter_entry(|e| !is_hidden(e)) {
    let entry = entry.unwrap();
    println!("{}", entry.path().display());
}

Motivation

std::fs has an unstable walk_dir implementation that needed some design work. I started off on that task, but it quickly became apparent that walking a directory recursively is quite complex and may not be a good fit for std right away.

This should at least resolve most or all of the issues reported here (and then some):

Performance

The short story is that performance is comparable with find and glibc's nftw on both a warm and cold file cache. In fact, I cannot observe any performance difference after running find /, walkdir / and nftw / on my local file system (SSD, ~3 million entries). More precisely, I am reasonably confident that this crate makes as few system calls and close to as few allocations as possible.

I haven't recorded any benchmarks, but here are some things you can try with a local checkout of walkdir:

# The directory you want to recursively walk:
DIR=$HOME

# If you want to observe perf on a cold file cache, run this before *each*
# command:
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

# To warm the caches
find $DIR

# Test speed of `find` on warm cache:
time find $DIR

# Compile and test speed of `walkdir` crate:
cargo build --release --example walkdir
time ./target/release/examples/walkdir $DIR

# Compile and test speed of glibc's `nftw`:
gcc -O3 -o nftw ./compare/nftw.c
time ./nftw $DIR

# For shits and giggles, test speed of Python's (2 or 3) os.walk:
time python ./compare/walk.py $DIR

On my system, the performance of walkdir, find and nftw is comparable.

Description
Rust library for walking directories recursively.
Readme 701 KiB
Languages
Rust 99.3%
C 0.5%
Python 0.2%