Back to Blog
November 15, 2025
8 min

Argus: Fast File Integrity Monitoring in Rust

You've deployed code to production. How do you know it hasn't been tampered with? A rootkit could modify binaries, a supply chain attack could swap libraries, or an insider threat could alter configuration files. By the time you notice, the damage is done.

File integrity monitoring (FIM) is a fundamental security control, but most tools are either:

  • Too slow for large directories (minutes to scan)
  • Too complex to configure and deploy
  • Too resource-intensive to run continuously

I built Argus to solve this: a lightweight, blazing-fast FIM tool that can scan thousands of files per second and detect changes in real-time.

The Problem: Traditional FIM is Slow

Most FIM tools are built for enterprise environments with complex policies, compliance reporting, and extensive configuration. For developers and security researchers who just need:

  • "Has anything changed in this directory?"
  • "What files were modified since my last scan?"
  • "Alert me when critical files are altered"

Existing tools are overkill.

And they're slow. Scanning a large codebase with tools like AIDE or Tripwire can take minutes because they:

  • Run on a single thread
  • Perform unnecessary operations (ACL checks, extended attributes)
  • Use inefficient file I/O patterns
  • Generate verbose logs

My Approach: Parallel SHA-256 at Scale

Argus is built on three principles:

1. Parallel Everything Modern machines have multiple cores - use them. Argus parallelizes:

  • Directory traversal
  • File reads
  • Checksum calculation
  • Output generation

2. Minimal Overhead Only compute what you need:

  • SHA-256 checksums (industry standard)
  • File size
  • Modification timestamp
  • Path

No unnecessary metadata, no complex policies.

3. Structured Output NDJSON (Newline Delimited JSON) for easy parsing:

{"path":"./src/main.rs","checksum":"a3f5...","size":2048,"timestamp":"2025-12-10T15:30:00Z"}
{"path":"./src/lib.rs","checksum":"b2e1...","size":4096,"timestamp":"2025-12-10T15:31:00Z"}

Perfect for scripting, monitoring systems, or feeding into SIEMs.

Technical Deep Dive

Parallel File Processing

The core of Argus is a work-stealing thread pool:

use rayon::prelude::*;
use sha2::{Sha256, Digest};

pub fn scan_directory(path: &Path, threads: usize) -> Result<Vec<FileRecord>> {
    // Configure thread pool
    let pool = rayon::ThreadPoolBuilder::new()
        .num_threads(threads)
        .build()?;

    pool.install(|| {
        // Collect all file paths
        let files: Vec<PathBuf> = WalkDir::new(path)
            .into_iter()
            .filter_map(|e| e.ok())
            .filter(|e| e.file_type().is_file())
            .map(|e| e.path().to_owned())
            .collect();

        // Process in parallel
        files.par_iter()
            .map(|file_path| compute_checksum(file_path))
            .collect()
    })
}

fn compute_checksum(path: &Path) -> Result<FileRecord> {
    let mut file = File::open(path)?;
    let mut hasher = Sha256::new();
    let mut buffer = vec![0u8; 8192]; // 8KB buffer

    loop {
        let bytes_read = file.read(&mut buffer)?;
        if bytes_read == 0 { break; }
        hasher.update(&buffer[..bytes_read]);
    }

    Ok(FileRecord {
        path: path.to_string(),
        checksum: format!("{:x}", hasher.finalize()),
        size: file.metadata()?.len(),
        timestamp: file.metadata()?.modified()?,
    })
}

Why Rayon?

Rayon is a data-parallelism library that makes parallel iteration trivial. The genius is work-stealing:

  • Each thread has a queue of tasks
  • When a thread finishes, it "steals" work from another thread
  • Automatic load balancing with no manual scheduling

This means Argus automatically adapts to:

  • Mixed file sizes (small configs + large binaries)
  • I/O latency variations
  • Number of available cores

Ignore Pattern Support

Security-focused FIM should respect .gitignore patterns. No one wants to checksum node_modules or .git directories.

Argus supports both .gitignore and .argusignore:

use ignore::WalkBuilder;

pub fn scan_with_ignores(path: &Path) -> Result<Vec<FileRecord>> {
    let walker = WalkBuilder::new(path)
        .add_ignore(".gitignore")
        .add_ignore(".argusignore")
        .build();

    // Walk respects ignore patterns automatically
    walker
        .filter_map(|e| e.ok())
        .filter(|e| e.file_type().is_file())
        .par_bridge() // Parallel iterator
        .map(|entry| compute_checksum(entry.path()))
        .collect()
}

This dramatically reduces scan time for large projects with many dependencies.

Real-Time Monitoring

The watch mode uses notify crate for filesystem event monitoring:

use notify::{Watcher, RecursiveMode, Event};

pub fn watch_directory(path: &Path, baseline: &[FileRecord]) -> Result<()> {
    let (tx, rx) = channel();
    let mut watcher = RecommendedWatcher::new(tx, Config::default())?;

    watcher.watch(path, RecursiveMode::Recursive)?;

    for event in rx {
        match event? {
            Event::Modify(path) | Event::Create(path) => {
                let new_checksum = compute_checksum(&path)?;
                let baseline_record = baseline.iter()
                    .find(|r| r.path == path);

                if let Some(old) = baseline_record {
                    if old.checksum != new_checksum.checksum {
                        alert_change(path, old, &new_checksum);
                    }
                }
            }
            Event::Remove(path) => {
                alert_deletion(path);
            }
            _ => {}
        }
    }
    Ok(())
}

This enables real-time alerting:

argus watch /var/www/html --baseline production.ndjson
# Alerts instantly when files change

Comparison Reports

Detecting what changed between two scans is critical for incident response:

# Baseline scan
argus scan --directory /srv/app --output baseline.ndjson

# Later, compare
argus scan --directory /srv/app --compare baseline.ndjson

Output shows:

  • Modified: Files with different checksums
  • Added: New files not in baseline
  • Deleted: Files in baseline but missing now

Implementation:

pub fn compare_scans(current: &[FileRecord], baseline: &[FileRecord])
    -> ComparisonReport
{
    let baseline_map: HashMap<&str, &FileRecord> = baseline.iter()
        .map(|r| (r.path.as_str(), r))
        .collect();

    let mut modified = Vec::new();
    let mut added = Vec::new();

    for record in current {
        match baseline_map.get(record.path.as_str()) {
            Some(old) if old.checksum != record.checksum => {
                modified.push((old, record));
            }
            None => {
                added.push(record);
            }
            _ => {} // Unchanged
        }
    }

    let deleted = baseline.iter()
        .filter(|r| !current.iter().any(|c| c.path == r.path))
        .collect();

    ComparisonReport { modified, added, deleted }
}

Performance Benchmarks

Tested on a MacBook Pro (M1, 8 cores) scanning a large codebase:

FilesSizeSingle Thread8 Threads (Argus)Speedup
1,00050MB2.3s0.4s5.8x
10,000500MB23.1s3.2s7.2x
50,0002GB118s15.7s7.5x

Real-world usage on a production web server (5,000 files):

  • Initial scan: 1.2 seconds
  • Incremental comparison: 0.3 seconds
  • Watch mode overhead: <1% CPU

Real-World Use Cases

Production Monitoring

I run Argus on production servers to detect unauthorized changes:

# Cron job every 5 minutes
*/5 * * * * argus scan /var/www --compare /var/baseline.ndjson && notify_slack

If anything changes, Slack alert with diff of modified files.

Supply Chain Security

Verify vendor-provided binaries haven't been tampered with:

# Generate checksums from trusted source
argus scan /opt/vendor-software --output trusted.ndjson

# Periodically verify
argus scan /opt/vendor-software --compare trusted.ndjson

Incident Response

After detecting a breach, quickly identify what was modified:

# Compare current state to pre-incident baseline
argus scan /compromised-system --compare pre-incident.ndjson > changes.txt

Shows exactly which files attackers modified.

Git Alternative for Non-Code

Track changes in directories that aren't under version control:

# Configuration directories
argus watch /etc --baseline /backups/etc-baseline.ndjson

# Data directories
argus watch /var/lib/important-data --baseline data-baseline.ndjson

Limitations & Future Work

Current Limitations:

  • Large files: 1GB file size limit (configurable)
  • No cryptographic signing: Checksums can be forged if attacker has root
  • Basic alerting: No built-in notification system
  • Single machine: Doesn't scale across distributed systems

Planned Features:

  • HMAC signatures for tamper-proof baselines
  • Built-in alerting (email, Slack, webhooks)
  • Distributed scanning for cluster deployments
  • SQLite storage option for faster comparisons
  • Filter by file patterns (only watch *.so files)

Why Rust for FIM?

Performance: Parallel processing with zero-cost abstractions Safety: No buffer overflows or data races when reading files concurrently Single Binary: Deploy one executable with no runtime dependencies Cross-Platform: Runs on Linux, macOS, Windows with same codebase

Most FIM tools are written in Python or C. Python is too slow for large scans. C requires careful memory management and is hard to parallelize safely. Rust gives you C-like performance with Python-like ergonomics.

Try It Yourself

Argus is open source and ready to use:

# Install from source
git clone https://github.com/abendrothj/Argus
cd Argus
cargo install --path .

# Or download pre-built binary from releases

# Basic usage
argus scan --directory /path/to/monitor --output baseline.ndjson

# Compare scans
argus scan --directory /path/to/monitor --compare baseline.ndjson

# Watch mode
argus watch --directory /path/to/monitor --baseline baseline.ndjson

# Custom thread count
argus scan --directory /large/directory --threads 16 --output scan.ndjson

For production use, I recommend:

  1. Generate baseline from known-good state
  2. Store baseline in immutable storage (S3, write-once filesystem)
  3. Run comparison scans via cron
  4. Alert on any differences
  5. Regenerate baseline after verified changes

Lessons Learned

Building Argus taught me:

  • Parallel I/O: How to efficiently read files concurrently without thrashing disk
  • Filesystem APIs: Deep dive into metadata, inodes, and platform differences
  • Benchmarking: Profiling parallel code is hard - learned to use cargo flamegraph
  • Rust Performance: Where async helps vs where thread pools are better
  • Security Operations: What practitioners actually need vs what vendors sell

The hardest part was optimizing for both many small files (configs) and few large files (binaries). Different access patterns need different strategies.

Closing Thoughts

File integrity monitoring shouldn't be complex or slow. Argus proves you can have:

  • Sub-second scan times
  • Simple deployment (one binary)
  • Minimal configuration (just specify a directory)
  • Structured output for automation

If you need to monitor files for changes - whether for security, compliance, or just peace of mind - give Argus a try.

And if you're working on systems programming in Rust, the codebase is a good example of parallel I/O, filesystem operations, and CLI design.


Security tools should be fast, simple, and transparent. Complex tools don't get deployed. Slow tools don't get run.