GitHub - jdefrancesco/dskDitto: Super fast duplicate file finder written in Golang.

6 min read Original article ↗

dskDitto

Go Reference Go Report Card License

dskDitto gnome logo

dskDitto is a fast, parallel duplicate-file detector with an sleek TUI that lets you review, keep, or safely delete redundant files.

Features

  • Concurrent directory walker tuned for large trees and multi-core systems
  • Targeted mode to search for duplicates of a single file
  • Multiple output modes: TUI, bullet lists, or text-friendly dumps
  • Optional automated duplicate removal with confirmation safety rails
  • Profiling toggles and micro-benchmarks for power users

Install

Install straight from source using Go 1.22+:

go install github.com/jdefrancesco/dskDitto/cmd/dskDitto@latest

This drops the binary at $(go env GOPATH)/bin/dskDitto (or ~/go/bin by default).

Build From Source

Ensure you have

  • go (1.22+)
  • gosec (install via go install github.com/securego/gosec/v2/cmd/gosec@latest)
git clone https://github.com/jdefrancesco/dskDitto
cd dskDitto
make

The resulting binary lives in bin/dskDitto. Add it to your $PATH or run it from the repo root.

Install the built binary somewhere on your path (defaults to /usr/local/bin) with:

sudo make install PREFIX=/usr/local/bin

Override PREFIX (for example make install PREFIX=$HOME/.local/bin) if you prefer a user-local install and want to skip sudo.

Usage

dskDitto [options] PATH...

Common flags:

Flag Description
--version Print the current version and exit
--no-banner Skip the startup banner
--profile <file> Write a CPU profile to the given file
--time-only Exit immediately after the scan, printing only the elapsed time
--min-size <bytes> Ignore files smaller than the provided size
--max-size <bytes> Skip files larger than the provided size (default 4 GiB)
--hidden Include dot files and dot-directories
--exclude <path> Exclude a path from scanning (repeatable; excludes descendants)
--no-symlinks Skip symbolic links
--empty Include zero-byte files
--include-vfs Include virtual filesystem directories such as /proc or /dev
--current Restrict the scan to only the specified paths (no recursion)
--depth <levels> Limit recursion to <levels> directories below the starting paths
--dups <count> Only show groups that contain at least <count> files
--text, --bullet Render duplicates without launching the TUI
--remove <keep> Operate on duplicates, keeping the first <keep> entries per group
--link With --remove, convert extra duplicates to symlinks instead of deleting them
--file <path> Only report duplicates of the given file
--hash <algo> Select hash algorithm: sha256 (default) or blake3
--csv-out <file> Write duplicate groups to CSV
--json-out <file> Write duplicate groups to JSON
--fs-detect <path> Print the filesystem type that contains <path>
--color-safe Use a high-compatibility TUI theme that avoids custom colors (best for problematic terminal themes)

Press Ctrl+C at any time to abort a scan. When duplicates are removed or converted, a confirmation dialog prevents accidental mass changes.

Duplicate removal and symlink conversion

dskDitto never deletes or rewrites anything unless you explicitly ask it to with --remove.

  • Dry / interactive modes: by default (or with --text / --bullet) the tool only reports duplicates.
  • Delete extras: use --remove <keep> to delete all but <keep> files in each duplicate group.
  • Convert extras to symlinks: combine --remove <keep> --link to replace extra duplicates with symlinks pointing at one kept file per group.

In the TUI you can also convert the currently marked files into symlinks: mark the duplicates you want to replace, then press L and enter the confirmation code. Each group’s symlinks will point at one unmarked file in that group.

On Unix-like systems, multiple hard links to the same underlying file are treated as a single entry during scanning: dskDitto hashes the content once and does not report those hard-link paths as separate space-wasting duplicates.

When using --link, the on-disk layout after the operation looks like this for a group of 3 identical files and --remove 1 --link:

/path/to/keep/file.txt      # original file kept
/path/to/dup/file-copy.txt  -> /path/to/keep/file.txt  (symlink)
/another/location/file.txt  -> /path/to/keep/file.txt  (symlink)

In the TUI, files that are symlinks are annotated with a [symlink] suffix so you can see which entries were converted.

Single-file duplicate search

Use --file /path/to/original.ext to hash a specific file first, then scan the provided directories for other files with identical content. If no duplicates are found in those directories, dskDitto exits cleanly; otherwise, all reporting/removal/export modes are limited to that single duplicate group (with the original file listed first).

Hash algorithms

By default, dskDitto uses SHA-256 for content hashing:

  • SHA-256 (--hash sha256): conservative, widely-supported choice with strong collision guarantees.
  • BLAKE3 (--hash blake3): Under many circumstances this is significantly faster on modern CPUs. However, on macOS SHA256 is fine tuned and out performs BLAKE3 most of the time. Thus, we leave SHA-256 as the default for now.

Examples

Scan your home directory and interactively review duplicates:

Exclude a directory (or file) from scanning:

dskDitto --exclude $HOME/Library/Caches $HOME

Exclude multiple paths in one scan (repeat --exclude):

dskDitto \
  --exclude $HOME/Library/Caches \
  --exclude $HOME/.cache \
  --exclude $HOME/Downloads \
  $HOME

List duplicates for scripting or grepping, without launching the TUI:

dskDitto --text ~/Pictures ~/Movies | grep "\.jpg$"

Find and safely delete duplicates larger than 100 MiB, keeping one copy per group:

dskDitto --min-size 100MiB --remove 1 /mnt/big-disk

Shrink a media library by converting duplicates into symlinks instead of deleting them:

dskDitto --remove 1 --link ~/Media

Export duplicate information to CSV or JSON for offline analysis:

dskDitto --csv-out dupes.csv  ~/Photos
dskDitto --json-out dupes.json ~/Projects

Recipes

  • Clean a downloads folder but keep one copy of each installer:

    dskDitto --min-size 10MiB --remove 1 ~/Downloads
  • Deduplicate a photo drive while preserving directory layout with symlinks:

    dskDitto --remove 1 --link /Volumes/photo-archive
  • Hunt for big redundant media files only:

    dskDitto --min-size 500MiB --text ~/Movies ~/TV
  • Use BLAKE3

    NOTE: On macOS, Blake3 will actually perform worse than SHA256 hence, we leave it as default for time being. Blake3's implementation may improve in the future, possibly out performing SHA256.

    dskDitto --hash blake3 --min-size 10MiB --text /mnt/big-disk
  • Feed duplicate groups into another tool via CSV:

    dskDitto --csv-out dupes.csv /data

Configuration

  • Log level: set DSKDITTO_LOG_LEVEL to debug, info, warn, etc.
  • Default options: wrap dskDitto in a shell alias or script with your favorite defaults.
  • Profiling: supply --pprof host:port to expose Go's pprof endpoints while the tool runs.

Screenshots

dskDitto rendered as a table

Screenshot: pretty table output

TUI for interactively selecting files to remove or keep

Screenshot: interactive TUI

Confirmation window keeps you from deleting the wrong files

Confirmation dialog screenshot

Legacy UI shots

Legacy screenshot 3

Legacy screenshot 4

Development

make debug         # Create development build
make test          # go test ./...
make bench         # run benchmarks (adds -benchmem)
make bench-profile # capture cpu.prof and mem.prof into the repo root
make pprof-web     # launch go tool pprof with HTTP UI for the latest profile

Contributing

Issues and PRs are welcome. Open an issue if you have ideas for improvements, new output modes, or performance tweaks.

License

This project is released under the Apache license. See LICENSE for details.