GitHub - deepjoy/shoebox: A Rust application providing S3-compatible object storage backed by local filesystem and SQLite metadata.

5 min read Original article ↗

CI crates.io Docker MIT license

A local S3-compatible server for your files. Find duplicates, verify integrity, zero config.

Shoebox webapp — browsing a bucket

Install

Prerequisites: Docker must be installed for the recommended method. Check with docker --version.

# Docker (recommended)
docker pull ghcr.io/deepjoy/shoebox:latest

# Or via Cargo (no Docker needed)
cargo install shoebox

Quick Start

# Point Shoebox at a directory
shoebox ~/Photos

# Or with Docker
docker run -it --rm -p 9000:9000 -v ~/Photos:/photos ghcr.io/deepjoy/shoebox /photos

# Output:
# Serving 1 bucket on http://localhost:9000
#   photos → /home/user/Photos

Files already on disk appear in S3 immediately — no uploading required. Credentials are generated on first run and printed in the output. To enable browser access (CORS), follow the on-screen instructions — or use the AWS CLI:

# Configure credentials (printed on first run)
aws configure --profile shoebox

# List objects
aws --profile shoebox --endpoint-url http://localhost:9000 s3 ls s3://photos/

asciicast

Features

  • S3-compatible API — works with AWS CLI, rclone, and any S3 SDK out of the box
  • Zero-config startup — just point at directories, no cloud account or configuration needed
  • Duplicate detection — find and merge duplicate files and directories via content hashing
  • Integrity verification — scheduled checks to detect bit rot and data corruption
  • Filesystem sync — background scanning with move detection, real-time file watching
  • Authentication — AWS Signature V4, per-bucket credentials, pre-signed URLs
  • Multipart uploads — full support for large file uploads
  • CORS — browser-based clients work out of the box
  • Webhook notifications — get notified on object events (put, delete, copy)
  • Single binary, ~18MB — no runtime dependencies

Duplicate Detection

Shoebox hashes every file (SHA-256) in the background. Finding duplicates is a query:

$ shoebox duplicates ~/Photos --format table

Duplicate groups (2 groups, 5 files, 3 duplicates):

  Hash (SHA-256)       Size   Files
  ─────────────────────────────────────────────
  a]3f…c8d1            32 B   3 copies
    originals/sunset.txt
    backup/sunset.txt        ← duplicate
    edited/sunset-copy.txt   ← duplicate

  7b2e…f104            26 B   2 copies
    originals/mountain.txt
    backup/mountain.txt      ← duplicate

Webapp

A companion browser UI is available at https://deepjoy.github.io/shoebox-webapp/.

Browse buckets, view objects, and see duplicate groups visually — no CLI needed. The webapp talks directly to your local Shoebox server via the S3 API.

CORS setup (required for browser access) — Shoebox prints this command on startup, just copy and run it:

export AWS_ACCESS_KEY_ID='<from startup output>'
export AWS_SECRET_ACCESS_KEY='<from startup output>'
export BUCKET='photos'

curl -X PUT "http://localhost:9000/${BUCKET}?cors" \
  --aws-sigv4 "aws:amz:us-east-1:s3" \
  --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '[{"allowed_origins":["*"],"allowed_methods":["GET","PUT","POST","DELETE","HEAD"],"allowed_headers":["*"],"expose_headers":["ETag","x-amz-request-id"],"max_age_seconds":3600}]'

Who It's For

  • Developers — test S3 integrations without cloud dependencies, work offline
  • Home users — expose NAS storage to S3-compatible backup tools, find duplicates with a single query
  • Archivists — verify file integrity with content hashes, detect bit rot
  • Privacy-conscious users — keep files local, no account required, no telemetry

Comparison

Concern Cloud S3 MinIO SeaweedFS Garage Shoebox
Primary strength Scalability, AWS ecosystem High performance, enterprise Small files, high throughput Simplicity, geo-replication Existing files, zero config
Best for Production workloads AI/ML, large data (TB/PB) Data lakes, file storage Edge/distributed, low ops Local dev, NAS, home lab
Architecture Managed service Specialized nodes Master/volume servers Homogeneous nodes Single process
Setup Account + IAM Docker + config Docker + config Docker + config Single command
Data location Cloud MinIO data dir SeaweedFS volumes Garage data dir Your existing files
File visibility S3 only S3 only S3, FUSE, WebDAV S3 only Filesystem + S3
Offline use No Yes Yes Yes Yes
Binary size N/A ~100MB ~40MB ~25MB ~18MB
Duplicate detection No No No No Built-in
Integrity checks Yes (default checksums) Yes (bitrot healing) Limited (CRC) Yes (scrub) Built-in (scheduled)
Max recommended scale Unlimited Petabytes Petabytes Petabytes ~10TB

See docs/why-shoebox.md for the full story.

When Not to Use Shoebox

See docs/when-not-to-use-shoebox.md for an honest assessment of limitations, including:

  • Strong consistency requirements
  • Distributed / multi-node storage
  • >10TB of data
  • Enterprise S3 features (object lock, lifecycle policies, versioning)
  • High-throughput ingestion (thousands of files/second)

Documentation

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Security

See SECURITY.md for the security model and how to report vulnerabilities.

License

MIT

Disclaimer

Shoebox operates directly on your existing files — it does not copy data into a separate storage directory. S3 operations like DeleteObject and PutObject will modify or remove real files on disk. Back up anything irreplaceable before use. This is pre-1.0 software provided "as is" with no warranty. See LICENSE for details. The authors are not liable for any data loss.

Background

I had 2TB of photos across 3 drives — backups of backups, originals I was afraid to delete. I set out to find duplicate photos and accidentally designed a local S3 server. If an object store knows the content hash of every file, duplicates are just a query. This is a personal project built in public — expect breaking changes before 1.0. If you have thoughts on the approach, open an issue or start a discussion.