Argus: Fast File Integrity Monitoring in Rust
You've deployed code to production. How do you know it hasn't been tampered with? A rootkit could modify binaries, a supply chain attack could swap libraries, or an insider threat could alter configuration files. By the time you notice, the damage is done.
File integrity monitoring (FIM) is a fundamental security control, but most tools are either:
- Too slow for large directories (minutes to scan)
- Too complex to configure and deploy
- Too resource-intensive to run continuously
I built Argus to solve this: a lightweight, blazing-fast FIM tool that can scan thousands of files per second and detect changes in real-time.
The Problem: Traditional FIM is Slow
Most FIM tools are built for enterprise environments with complex policies, compliance reporting, and extensive configuration. For developers and security researchers who just need:
- "Has anything changed in this directory?"
- "What files were modified since my last scan?"
- "Alert me when critical files are altered"
Existing tools are overkill.
And they're slow. Scanning a large codebase with tools like AIDE or Tripwire can take minutes because they:
- Run on a single thread
- Perform unnecessary operations (ACL checks, extended attributes)
- Use inefficient file I/O patterns
- Generate verbose logs
My Approach: Parallel SHA-256 at Scale
Argus is built on three principles:
1. Parallel Everything Modern machines have multiple cores - use them. Argus parallelizes:
- Directory traversal
- File reads
- Checksum calculation
- Output generation
2. Minimal Overhead Only compute what you need:
- SHA-256 checksums (industry standard)
- File size
- Modification timestamp
- Path
No unnecessary metadata, no complex policies.
3. Structured Output NDJSON (Newline Delimited JSON) for easy parsing:
{"path":"./src/main.rs","checksum":"a3f5...","size":2048,"timestamp":"2025-12-10T15:30:00Z"}
{"path":"./src/lib.rs","checksum":"b2e1...","size":4096,"timestamp":"2025-12-10T15:31:00Z"}
Perfect for scripting, monitoring systems, or feeding into SIEMs.
Technical Deep Dive
Parallel File Processing
The core of Argus is a work-stealing thread pool:
use rayon::prelude::*;
use sha2::{Sha256, Digest};
pub fn scan_directory(path: &Path, threads: usize) -> Result<Vec<FileRecord>> {
// Configure thread pool
let pool = rayon::ThreadPoolBuilder::new()
.num_threads(threads)
.build()?;
pool.install(|| {
// Collect all file paths
let files: Vec<PathBuf> = WalkDir::new(path)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.map(|e| e.path().to_owned())
.collect();
// Process in parallel
files.par_iter()
.map(|file_path| compute_checksum(file_path))
.collect()
})
}
fn compute_checksum(path: &Path) -> Result<FileRecord> {
let mut file = File::open(path)?;
let mut hasher = Sha256::new();
let mut buffer = vec![0u8; 8192]; // 8KB buffer
loop {
let bytes_read = file.read(&mut buffer)?;
if bytes_read == 0 { break; }
hasher.update(&buffer[..bytes_read]);
}
Ok(FileRecord {
path: path.to_string(),
checksum: format!("{:x}", hasher.finalize()),
size: file.metadata()?.len(),
timestamp: file.metadata()?.modified()?,
})
}
Why Rayon?
Rayon is a data-parallelism library that makes parallel iteration trivial. The genius is work-stealing:
- Each thread has a queue of tasks
- When a thread finishes, it "steals" work from another thread
- Automatic load balancing with no manual scheduling
This means Argus automatically adapts to:
- Mixed file sizes (small configs + large binaries)
- I/O latency variations
- Number of available cores
Ignore Pattern Support
Security-focused FIM should respect .gitignore patterns. No one wants to checksum node_modules or .git directories.
Argus supports both .gitignore and .argusignore:
use ignore::WalkBuilder;
pub fn scan_with_ignores(path: &Path) -> Result<Vec<FileRecord>> {
let walker = WalkBuilder::new(path)
.add_ignore(".gitignore")
.add_ignore(".argusignore")
.build();
// Walk respects ignore patterns automatically
walker
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.par_bridge() // Parallel iterator
.map(|entry| compute_checksum(entry.path()))
.collect()
}
This dramatically reduces scan time for large projects with many dependencies.
Real-Time Monitoring
The watch mode uses notify crate for filesystem event monitoring:
use notify::{Watcher, RecursiveMode, Event};
pub fn watch_directory(path: &Path, baseline: &[FileRecord]) -> Result<()> {
let (tx, rx) = channel();
let mut watcher = RecommendedWatcher::new(tx, Config::default())?;
watcher.watch(path, RecursiveMode::Recursive)?;
for event in rx {
match event? {
Event::Modify(path) | Event::Create(path) => {
let new_checksum = compute_checksum(&path)?;
let baseline_record = baseline.iter()
.find(|r| r.path == path);
if let Some(old) = baseline_record {
if old.checksum != new_checksum.checksum {
alert_change(path, old, &new_checksum);
}
}
}
Event::Remove(path) => {
alert_deletion(path);
}
_ => {}
}
}
Ok(())
}
This enables real-time alerting:
argus watch /var/www/html --baseline production.ndjson
# Alerts instantly when files change
Comparison Reports
Detecting what changed between two scans is critical for incident response:
# Baseline scan
argus scan --directory /srv/app --output baseline.ndjson
# Later, compare
argus scan --directory /srv/app --compare baseline.ndjson
Output shows:
- Modified: Files with different checksums
- Added: New files not in baseline
- Deleted: Files in baseline but missing now
Implementation:
pub fn compare_scans(current: &[FileRecord], baseline: &[FileRecord])
-> ComparisonReport
{
let baseline_map: HashMap<&str, &FileRecord> = baseline.iter()
.map(|r| (r.path.as_str(), r))
.collect();
let mut modified = Vec::new();
let mut added = Vec::new();
for record in current {
match baseline_map.get(record.path.as_str()) {
Some(old) if old.checksum != record.checksum => {
modified.push((old, record));
}
None => {
added.push(record);
}
_ => {} // Unchanged
}
}
let deleted = baseline.iter()
.filter(|r| !current.iter().any(|c| c.path == r.path))
.collect();
ComparisonReport { modified, added, deleted }
}
Performance Benchmarks
Tested on a MacBook Pro (M1, 8 cores) scanning a large codebase:
| Files | Size | Single Thread | 8 Threads (Argus) | Speedup |
|---|---|---|---|---|
| 1,000 | 50MB | 2.3s | 0.4s | 5.8x |
| 10,000 | 500MB | 23.1s | 3.2s | 7.2x |
| 50,000 | 2GB | 118s | 15.7s | 7.5x |
Real-world usage on a production web server (5,000 files):
- Initial scan: 1.2 seconds
- Incremental comparison: 0.3 seconds
- Watch mode overhead: <1% CPU
Real-World Use Cases
Production Monitoring
I run Argus on production servers to detect unauthorized changes:
# Cron job every 5 minutes
*/5 * * * * argus scan /var/www --compare /var/baseline.ndjson && notify_slack
If anything changes, Slack alert with diff of modified files.
Supply Chain Security
Verify vendor-provided binaries haven't been tampered with:
# Generate checksums from trusted source
argus scan /opt/vendor-software --output trusted.ndjson
# Periodically verify
argus scan /opt/vendor-software --compare trusted.ndjson
Incident Response
After detecting a breach, quickly identify what was modified:
# Compare current state to pre-incident baseline
argus scan /compromised-system --compare pre-incident.ndjson > changes.txt
Shows exactly which files attackers modified.
Git Alternative for Non-Code
Track changes in directories that aren't under version control:
# Configuration directories
argus watch /etc --baseline /backups/etc-baseline.ndjson
# Data directories
argus watch /var/lib/important-data --baseline data-baseline.ndjson
Limitations & Future Work
Current Limitations:
- Large files: 1GB file size limit (configurable)
- No cryptographic signing: Checksums can be forged if attacker has root
- Basic alerting: No built-in notification system
- Single machine: Doesn't scale across distributed systems
Planned Features:
- HMAC signatures for tamper-proof baselines
- Built-in alerting (email, Slack, webhooks)
- Distributed scanning for cluster deployments
- SQLite storage option for faster comparisons
- Filter by file patterns (only watch
*.sofiles)
Why Rust for FIM?
Performance: Parallel processing with zero-cost abstractions Safety: No buffer overflows or data races when reading files concurrently Single Binary: Deploy one executable with no runtime dependencies Cross-Platform: Runs on Linux, macOS, Windows with same codebase
Most FIM tools are written in Python or C. Python is too slow for large scans. C requires careful memory management and is hard to parallelize safely. Rust gives you C-like performance with Python-like ergonomics.
Try It Yourself
Argus is open source and ready to use:
# Install from source
git clone https://github.com/abendrothj/Argus
cd Argus
cargo install --path .
# Or download pre-built binary from releases
# Basic usage
argus scan --directory /path/to/monitor --output baseline.ndjson
# Compare scans
argus scan --directory /path/to/monitor --compare baseline.ndjson
# Watch mode
argus watch --directory /path/to/monitor --baseline baseline.ndjson
# Custom thread count
argus scan --directory /large/directory --threads 16 --output scan.ndjson
For production use, I recommend:
- Generate baseline from known-good state
- Store baseline in immutable storage (S3, write-once filesystem)
- Run comparison scans via cron
- Alert on any differences
- Regenerate baseline after verified changes
Lessons Learned
Building Argus taught me:
- Parallel I/O: How to efficiently read files concurrently without thrashing disk
- Filesystem APIs: Deep dive into metadata, inodes, and platform differences
- Benchmarking: Profiling parallel code is hard - learned to use
cargo flamegraph - Rust Performance: Where async helps vs where thread pools are better
- Security Operations: What practitioners actually need vs what vendors sell
The hardest part was optimizing for both many small files (configs) and few large files (binaries). Different access patterns need different strategies.
Closing Thoughts
File integrity monitoring shouldn't be complex or slow. Argus proves you can have:
- Sub-second scan times
- Simple deployment (one binary)
- Minimal configuration (just specify a directory)
- Structured output for automation
If you need to monitor files for changes - whether for security, compliance, or just peace of mind - give Argus a try.
And if you're working on systems programming in Rust, the codebase is a good example of parallel I/O, filesystem operations, and CLI design.
Security tools should be fast, simple, and transparent. Complex tools don't get deployed. Slow tools don't get run.