GitHub - kantord/frecenfile: List frecent files in a Git repository

2 min read Original article ↗

frecenfile

frecenfile computes frecency scores for files in Git repositories. Frecency combines the frequency and recency of events.

This is useful as a heauristic for finding relevant or trending files when all you have to work with is the commit history.

Performance

frecenfile is highly scalabe, producing a sorted output within miliseconds for mid-sized repositories, and processing the entire commit history Linux in under a minute. Processing the last 3000 commits in the Linux repository takes just around a second.

For most purposes, the results should be easily cacheable.

Cache

frecenfile stores a per-repo cache in the OS cache directory. You can override the location with FRECENFILE_CACHE_DIR. If the cache directory is not writable, frecenfile falls back to a temporary cache or no-cache mode instead of failing.

Git history

By default, frecenfile processes the last 3000 commits, but this can be modified using the --max-commits flag. Processing an excessive amounts of commits would not usually be usueful, as "trending" files are not likely to be buried deep in the commit history. Processing only a smaller amount of commits is not likely to be needed for performance reasons, but might be useful for some use cases.

📦 Installation

🚀 Usage

Score every file in the current repo, highest first

Only list paths, omit scores

Restrict analysis to certain directories

frecenfile --paths src tests

Sort oldest/least-touched files first

Example output

12.9423   src/lib.rs
 9.3310   src/analyze.rs
 2.7815   README.md