GitHub - jdefrancesco/dskDitto: Ultra fast and easy duplicate file finder. Awesome TUI/GUI to manage results.

9 min read Original article ↗

Go Reference Go Report Card License

dskDitto gnome logo

dskDitto The ultra-fast, parallel duplicate-file detector with interactive menus that make clearing unnecessary duplicates hassle free!

Features

  • Blazingly fast duplicate scanning — Parallel processing finds duplicates across large disks instantly.
  • Interactive TUI by default — Browse, compare, and manage duplicates with an intuitive terminal interface powered by Bubble Tea.
  • Optional GUI — Use the experimental Raylib GUI for a graphical alternative to the TUI.
  • Safe deletion & symlink conversion — Remove duplicates or replace them with symlinks, with confirmation dialogs to prevent accidents.
  • Smart single-file search — Hash a specific file and instantly find all its duplicates across your filesystem.
  • Flexible hashing — Choose between SHA-256 (default) or BLAKE3 for content verification.
  • Fine-grained filtering — Skip files by size, depth, hidden files, symlinks, and virtual filesystems.
  • Export results — Save findings to CSV, JSON, or plain text for reporting or automation.
  • Unix hard-link aware — Treats hard-linked files intelligently to avoid false duplicates.

Install

Install straight from source using Go 1.22+:

go install github.com/jdefrancesco/dskDitto/cmd/dskDitto@latest

This drops the binary at $(go env GOPATH)/bin/dskDitto (or ~/go/bin by default).

Usage

dskDitto [options] PATH ...

Common flags:

Flag Description
--version Print the current version and exit
--no-banner Skip the startup banner
--gui Review results in the experimental Raylib GUI instead of the default TUI
--profile <file> Write a CPU profile to the given file
--time-only Exit immediately after the scan, printing only the elapsed time
--min-size <bytes> Ignore files smaller than the provided size
--max-size <bytes> Skip files larger than the provided size (default 4 GiB)
--hidden Include dot files and dot-directories
--exclude <path> Exclude a path from scanning (repeatable; excludes descendants)
--no-symlinks Skip symbolic links
--empty Include zero-byte files
--include-vfs Include virtual filesystem directories such as /proc or /dev
--dir-concurrency <int> Limit concurrent directory reads; values <= 0 use automatic tuning
--no-cache On supported platforms, ask the OS not to populate the filesystem cache while hashing
--current Restrict the scan to only the specified paths (no recursion)
--depth <levels> Limit recursion to <levels> directories below the starting paths
--dups <count> Only show groups that contain at least <count> files
--text, --bullet Render duplicates without launching the TUI
--remove <keep> Operate on duplicates, keeping the first <keep> entries per group
--link With --remove, convert extra duplicates to symlinks instead of deleting them
--file <path> Only report duplicates of the given file; with --name-only, match by that file's exact name
--name-only Shallow mode: group files by exact file name, ignoring content and size
--file-shallow <path> Shallow mode: only report files with the same exact name as <path>
--fuzzy Content-based near-duplicate mode (file similarity, not filename similarity)
--fuzzy-threshold <pct> Minimum similarity percentage in fuzzy mode (default 75)
--fuzzy-same-ext In fuzzy mode, only compare files that share the same extension
--hash <algo> Select hash algorithm: sha256 (default) or blake3
--csv-out <file> Write duplicate groups to CSV
--json-out <file> Write duplicate groups to JSON
--fs-detect <path> Print the filesystem type that contains <path>
--color-safe Use a high-compatibility TUI theme that avoids custom colors (best for problematic terminal themes)
--no-confirm Skip interactive confirmation codes for TUI/GUI delete and link actions

Press Ctrl+C at any time to abort a scan. When duplicates are removed or converted through the TUI or GUI, a confirmation dialog prevents accidental mass changes unless --no-confirm is set.

Duplicate removal and symlink conversion

dskDitto never deletes or rewrites anything unless you explicitly ask it to with --remove.

  • Dry / interactive modes: by default (or with --text / --bullet) the tool only reports duplicates.
  • Delete extras: use --remove <keep> to delete all but <keep> files in each duplicate group.
  • Convert extras to symlinks: combine --remove <keep> --link to replace extra duplicates with symlinks pointing at one kept file per group.

In the TUI you can also convert the currently marked files into symlinks: mark the duplicates you want to replace, then press L and enter the confirmation code. Each group’s symlinks will point at one unmarked file in that group. Power users can pass --no-confirm to skip the confirmation code in the TUI and GUI.

On Unix-like systems, multiple hard links to the same underlying file are treated as a single entry during scanning: dskDitto hashes the content once and does not report those hard-link paths as separate space-wasting duplicates.

When using --link, the on-disk layout after the operation looks like this for a group of 3 identical files and --remove 1 --link:

/path/to/keep/file.txt      # original file kept
/path/to/dup/file-copy.txt  -> /path/to/keep/file.txt  (symlink)
/another/location/file.txt  -> /path/to/keep/file.txt  (symlink)

In the TUI, files that are symlinks are annotated with a [symlink] suffix so you can see which entries were converted.

Single-file duplicate search

Use --file /path/to/original.ext to hash a specific file first, then scan the provided directories for other files with identical content. If no duplicates are found in those directories, dskDitto exits cleanly; otherwise, all reporting/removal/export modes are limited to that single duplicate group (with the original file listed first).

Shallow filename duplicate search

Use --name-only to group files by exact final filename without hashing file contents. For example, dir1/text1 and dir2/text1 are considered duplicates even when their contents differ. Combine --name-only --file /path/to/text1, or use --file-shallow /path/to/text1, to limit shallow results to one exact filename. When the shallow target is a dotfile, dskDitto automatically includes hidden files and directories for that scan.

Restore backups are not supported for shallow filename matches because same-name files may contain different data. If --backup is combined with --name-only or --file-shallow, dskDitto prints a warning and exits before scanning or changing files.

Fuzzy content matching (near duplicates)

Use --fuzzy to find files with similar content even when they are not byte-for-byte identical. This mode compares file content signatures only; it does not use filename similarity.

By default, fuzzy mode returns groups at >=75% similarity:

dskDitto --fuzzy ~/Downloads

Tune the similarity cutoff when needed:

dskDitto --fuzzy --fuzzy-threshold 90 ~/Downloads

Restrict fuzzy comparisons to matching extensions:

dskDitto --fuzzy --fuzzy-same-ext ~/Downloads

--fuzzy results are review-only near matches. Automatic mutation flows (--remove / --link) are disabled in fuzzy mode.

Hash algorithms

By default, dskDitto uses SHA-256 for content hashing:

  • SHA-256 (--hash sha256): conservative, widely-supported choice with strong collision guarantees.
  • BLAKE3 (--hash blake3): Under many circumstances this is significantly faster on modern CPUs. However, on macOS SHA256 is fine tuned and out performs BLAKE3 most of the time. Thus, we leave SHA-256 as the default for now.

Examples

Scan your home directory and interactively review duplicates:

Use the experimental Raylib windowed UI:

Exclude a directory (or file) from scanning:

dskDitto --exclude $HOME/Library/Caches $HOME

Exclude multiple paths in one scan (repeat --exclude):

dskDitto \
  --exclude $HOME/Library/Caches \
  --exclude $HOME/.cache \
  --exclude $HOME/Downloads \
  $HOME

List duplicates for scripting or grepping, without launching the TUI:

dskDitto --text ~/Pictures ~/Movies | grep "\.jpg$"

Find files that share the same exact filename, ignoring contents:

dskDitto --name-only --text ~/Downloads ~/Documents

Find and safely delete duplicates larger than 100 MiB, keeping one copy per group:

dskDitto --min-size 100MiB --remove 1 /mnt/big-disk

Shrink a media library by converting duplicates into symlinks instead of deleting them:

dskDitto --remove 1 --link ~/Media

Export duplicate information to CSV or JSON for offline analysis:

dskDitto --csv-out dupes.csv  ~/Photos
dskDitto --json-out dupes.json ~/Projects

Recipes

  • Clean a downloads folder but keep one copy of each installer:

    dskDitto --min-size 10MiB --remove 1 ~/Downloads
  • Deduplicate a photo drive while preserving directory layout with symlinks:

    dskDitto --remove 1 --link /Volumes/photo-archive
  • Hunt for big redundant media files only:

    dskDitto --min-size 500MiB --text ~/Movies ~/TV
  • Use BLAKE3

    NOTE: On macOS, Blake3 will actually perform worse than SHA256 hence, we leave it as default for time being. Blake3's implementation may improve in the future, possibly out performing SHA256.

    dskDitto --hash blake3 --min-size 10MiB --text /mnt/big-disk
  • Feed duplicate groups into another tool via CSV:

    dskDitto --csv-out dupes.csv /data

Result Display Menus

Screenshot: interactive TUI

Bubble Tea was used for TUI

GUI Result Display

Screenshot: Raylib GUI duplicate review

GUI built with Raylib

Benchmarks

Benchmark directory traversal on your machine before choosing a fixed concurrency value. On fast APFS SSDs, the best range is usually workload-dependent:

go build -o dskDitto ./cmd/dskDitto
for w in 16 24 32 48 64 96 128; do
  /usr/bin/time -p ./dskDitto --time-only --dir-concurrency "$w" ~
done

Use /usr/bin/time -l ./dskDitto --time-only ~ for a more detailed macOS run. --no-cache is also benchmark-only by default; test it with the same workload before keeping it in your normal command.

Build From Source (Development)

Ensure you have

  • go (1.22+)
  • gosec (install via go install github.com/securego/gosec/v2/cmd/gosec@latest)
git clone https://github.com/jdefrancesco/dskDitto
cd dskDitto
make

The resulting binary lives in bin/dskDitto. Add it to your $PATH or run it from the repo root. To explicitly build and smoke-run the Raylib GUI path:

make build-gui
make run-gui GUI_PATH=$HOME

Install the built binary somewhere on your path (defaults to /usr/local/bin) with:

sudo make install PREFIX=/usr/local/bin

Override PREFIX (for example make install PREFIX=$HOME/.local/bin) if you prefer a user-local install and want to skip sudo.

make debug         # Create development build
make build-gui     # Build a GUI-capable binary
make run-gui       # Build and launch the Raylib GUI against GUI_PATH (default ".")
make release-check # Print the tag/push/public-install release checklist
make release-install-check # Verify what go install ...@latest currently installs
make test          # go test ./...
make bench         # run benchmarks (adds -benchmem)
make bench-profile # capture cpu.prof and mem.prof into the repo root
make pprof-web     # launch go tool pprof with HTTP UI for the latest profile

Architecture

See DittoDoc

Configuration

  • Log level: set DSKDITTO_LOG_LEVEL to debug, info, warn, etc.
  • Default options: wrap dskDitto in a shell alias or script with your favorite defaults.
  • Profiling: supply --pprof host:port to expose Go's pprof endpoints while the tool runs.

Contributing

Issues and PRs are welcome. Open an issue if you have ideas for improvements, new output modes, or performance tweaks. I only develop this in my spare time which is less and less these days. New contributors are definitely something the project needs.

License

This project is released under the Apache license. See LICENSE for details.