GitHub - debarshibasak/assets

Asset - repository for large files

When I started building a game, one of the key problems I ran into with git was large files. Goal is have a simple trunk based version control system that is storage wise cost efficient and transfer large files effectively. My usecase is game development and managing the assets. But it could be used in various usecases. Below is the analysis of all the version control systems.

Capability	assets (this)	Perforce	SVN	Git	Git LFS	Plastic SCM	DVC
Designed for large binary files	Yes	Yes	Partial	No	Yes	Yes	Yes
Authority model	Centralized	Centralized	Centralized	Distributed	Distributed (LFS server is central)	Centralized + DVCS hybrid	Centralized data, git for metadata
Versioning granularity	Per-file linear (v1, v2, …)	Per-file linear (changelists)	Per-file linear	Tree snapshots (commits)	Tree snapshots + LFS pointers	Per-file + changesets	Per-file (DVC metafiles in git)
Branching / merging	Trunk + short-lived named branches (planned)	Streams / branches	Branches	First-class	Inherits git	First-class	Inherits git
Mandatory lock-on-edit	Yes (TTL + heartbeat)	Yes (`p4 edit`)	Optional (`svn lock`)	No	Optional (file locking add-on)	Yes (`cm lock`)	No
Partial sync / sparse checkout	Yes (by prefix)	Yes (workspace spec, streams)	Yes (sparse checkout)	Partial (sparse-checkout, partial clone)	Inherits git	Yes (cloaked items)	Partial (selective pulls)
Content-addressed dedup	Yes (SHA-256)	No	No	Yes (commit graph)	Yes (LFS objects)	Partial	Yes (md5/sha256 of blobs)
Working-tree state file	Yes (`.asset/state.json`)	Yes (have list)	Yes (`.svn/`)	Yes (`.git/`)	Inherits git	Yes (workspace DB)	Yes (`.dvc/`)
Audit "who pushed what when"	Yes (`asset history`)	Yes (changelists)	Yes (`svn log`)	Yes (commit log)	Inherits git	Yes	Inherits git
Server in pure OSS	Yes (this repo)	No (free for ≤5 users)	Yes	Yes	Yes (LFS reference server)	No (free for ≤3 users)	Yes (any S3/GS bucket)
Self-hosted (no SaaS dependency)	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Storage backend	Local FS or S3-compatible	Proprietary depot	Proprietary repo	Proprietary repo	Local or S3/Azure/etc	Proprietary	Any blob store (S3, GS, Azure, SSH, …)
Diff / merge for text	No	Decent	Decent	Excellent	Excellent	Decent	N/A (data-oriented)
Presigned-URL data plane (client ↔ object store)	Yes (S3 backend)	No	No	No	Yes	No	Yes
Typical setup time	Minutes (single binary or Docker)	Hours (Helix server + admin)	Minutes (`svnserve`)	Seconds (`git init`)	Minutes (LFS server)	Hours (Plastic server + admin)	Minutes (`dvc init` + remote)

Getting started

Environment variables

Every YAML field has an ASSET_* override; env vars win when set. Useful for containers and 12-factor deployments. Common ones:

Variable	YAML path
`ASSET_CONFIG_PATH`	`-config` flag default
`ASSET_CONFIG_OPTIONAL=1`	tolerate a missing config file (env-only mode)
`ASSET_LISTEN`	`listen`
`ASSET_METADATA_PATH`	`metadata_path`
`ASSET_TLS_ENABLED` / `_CERT_FILE` / `_KEY_FILE` / `_AUTO_SELF_SIGNED` / `_HOSTS`	`tls.*` (`_HOSTS` is comma-separated)
`ASSET_STORAGE_BACKEND`	`storage.backend` (`local` or `s3`)
`ASSET_STORAGE_LOCAL_ROOT`	`storage.local.root`
`ASSET_STORAGE_S3_ENDPOINT` / `_REGION` / `_BUCKET` / `_ACCESS_KEY_ID` / `_SECRET_ACCESS_KEY` / `_USE_SSL` / `_PREFIX`	`storage.s3.*`
`ASSET_STATE_BACKEND`	`state.backend` (`memory` or `redis`)
`ASSET_STATE_REDIS_ADDR` / `_PASSWORD` / `_DB` / `_PREFIX`	`state.redis.*`
`ASSET_AUTH_AUTHORIZED_KEYS_FILE`	`auth.authorized_keys_file`
`ASSET_AUTH_MAX_CLOCK_SKEW_SECONDS`	`auth.max_clock_skew_seconds`
`ASSET_AUTH_ALLOW_ANONYMOUS`	`auth.allow_anonymous`
`ASSET_AUTH_TOKENS`	`auth.tokens` (format: `token1=alice,token2=bob`)
`ASSET_LOCKS_MIN_TTL_SECONDS` / `_MAX_TTL_SECONDS`	`locks.*`
`ASSET_DEFAULTS_KEEP_VERSIONS`	`defaults.keep_versions`

Booleans accept 1/0/true/false/yes/no/on/off. Repos (repos:) are not exposed via env — create them at runtime through POST /v1/repos.

Env-only example (no YAML file at all):

ASSET_CONFIG_OPTIONAL=1 \
ASSET_STORAGE_BACKEND=local \
ASSET_STORAGE_LOCAL_ROOT=/var/lib/assets \
ASSET_AUTH_TOKENS=dev-token-123=alice \
./bin/asset-server -config ""

Storage backends

local — files under storage.local.root, content-addressed by SHA-256.
s3 — any S3-compatible endpoint (AWS S3, MinIO, R2, Backblaze B2).

Switch by changing storage.backend in the YAML.

Version retention

keep_versions controls how many historical versions of each path are retained. Set per repo (or via defaults.keep_versions). 0 means unlimited. Pruning happens during upload; the blob is left in storage in case it is still referenced by another path (cheap-storage assumption).

Using the CLI

mkdir my-game && cd my-game
asset init my-game --server http://localhost:8080 --token dev-token-123

# Drop some files in
mkdir -p sprites/hero music
cp ~/Downloads/idle.png sprites/hero/
cp ~/Downloads/track.ogg music/

asset status               # see local-added entries
asset push                 # acquires lock, heartbeats, uploads, releases
asset status               # now up-to-date

# Partial sync from another machine:
asset init my-game --server http://localhost:8080 --token dev-token-123
asset pull sprites/hero    # only fetch this prefix

Lock + heartbeat semantics

asset push acquires a lock covering the affected path prefixes (the top-level directory of every changed file). While the upload runs, the client heartbeats the lock at TTL/3. If the heartbeat fails (server lost, network partition), the push aborts.

Multiple non-overlapping prefix locks can coexist within one repo.
Specifying no prefixes locks the whole repo.
TTL is clamped to the server's locks.{min,max}_ttl_seconds.

Default TTL: 60s. Override with asset push --ttl 2m.

Partial sync

Both push and pull accept positional path arguments. Anything not under those prefixes is ignored for the operation. Local state (.asset/state.json) remembers each path's last-synced version so partial syncs from different machines compose cleanly.

Parallelism

Both push and pull transfer files concurrently to saturate bandwidth. Use -j N (or --parallel N) to override the worker count; 0 (default) auto-sizes to NumCPU clamped to [4, 16].

asset push -j 16          # up to 16 concurrent uploads
asset pull -j 32 sprites/ # 32 concurrent downloads for one prefix

State writes are serialized across workers, and any failure cancels the remaining transfers via errgroup. The lock is acquired once for the whole push and heartbeated for its lifetime regardless of worker count.

Authentication

Two schemes are supported and can run side-by-side; the client picks based on its config and the server accepts whichever the request presents.

Bearer tokens (simple, good for CI):

# server.yaml
auth:
  tokens:
    dev-token-123: alice

# .asset/config.yaml
token: dev-token-123

Ed25519 keypair (per-user, signed requests, no shared secret on the wire):

# On the client:
asset key gen --identity alice@laptop --update-config
#  -> writes .asset/keys/alice@laptop.key (0600)
#  -> updates .asset/config.yaml to point at it
#  -> prints an "ed25519 ... alice@laptop" line

# Paste that line into the server's authorized_keys file:
asset key pub >> /path/on/server/authorized_keys

# server.yaml
auth:
  authorized_keys_file: keys/authorized_keys
  max_clock_skew_seconds: 300

Every signed request carries:

Authorization: AssetKey <key_id>:<unix-ts>:<base64-ed25519-sig>
X-Asset-Body-SHA256: <hex>

The signature covers METHOD\nPATH?QUERY\nTIMESTAMP\nBODY_SHA256_HEX, so any tampering with the URL, method, body, or timestamp invalidates the request. For PUT uploads the client sets X-Asset-Body-SHA256 equal to the existing X-Asset-SHA256 (the content hash), so no body buffering is required for large blobs.

asset key show / asset key pub inspect a key file. Public-key auth takes precedence if both token and private_key_file are set in .asset/config.yaml.

Transport: HTTPS + HTTP/2

Set tls.enabled: true and the server speaks HTTPS, with HTTP/2 negotiated automatically via ALPN. For development, tls.auto_self_signed: true generates a cert under data/tls/ and uses it.

# server.yaml
tls:
  enabled: true
  auto_self_signed: true
  # or use a real cert:
  # cert_file: /etc/ssl/asset/server.crt
  # key_file:  /etc/ssl/asset/server.key

Client-side, point .asset/config.yaml at the HTTPS URL and either pin the CA bundle or (only for dev against your own self-signed cert) skip verification:

server: https://my-asset-server:8080
repo: my-game
tls:
  ca_file: /path/to/server.crt
  # insecure_skip_verify: true   # local development only

Presigned-URL data plane (S3 backend)

When storage.backend: s3 (AWS S3, MinIO, R2, Backblaze B2, …), the server gets out of the data path. asset push/pull issues an init call to the server, uploads blobs directly to the object store via a server-signed URL, then issues a commit. Downloads work the same way: a JSON pointer is fetched from the server, then the bytes are pulled from the object store.

Capabilities are advertised at GET /v1/capabilities; the client switches modes automatically. Use --no-presign on push/pull to force the streaming-through-server path (useful for debugging or for clients behind a proxy that can't reach S3).

Content-addressed dedup still works: pushing a file whose SHA-256 matches an existing blob skips the actual transfer entirely ([dedup, no transfer] in the log).

With an S3 backend, presigned PUTs are signed with x-amz-checksum-sha256 so the storage tier itself rejects bodies that don't hash to the expected value. At commit time the server reads the stored checksum back and rejects the upload if it still disagrees.

Multi-server deployment

Set state.backend: redis in server.yaml and point multiple asset-server replicas at the same Redis. Locks, presigned-upload sessions, and the replay-protection nonce cache are then shared across all replicas: a lock acquired on server A correctly blocks a conflicting acquire on server B, a presigned upload session opened on A can be committed via B, and a captured signed request replayed against any replica is rejected.

state:
  backend: redis
  redis:
    addr: redis:6379
    prefix: asset

What this does not solve: the bolt metadata store is still single-writer. Each asset-server replica needs its own bolt file (or you must front the whole fleet with a single metadata writer). A full multi-server story would replace bolt with Postgres (or similar) — flagged as the next gap.

Blob garbage collection

When keep_versions prunes an old version, the asset server checks whether any surviving version still references that blob (content-addressed dedup means several paths/versions can share one blob). If the refcount drops to zero, the blob is deleted from the object store in the background and the event is logged as gc: removed orphan blob <key>.

The check is best-effort and runs outside the metadata transaction — there is a small race window where a concurrent upload could create a new reference to a blob we then delete. The next access for that path would see a missing blob; clients re-upload on demand.

Integrity

Every transfer is SHA-256 verified:

Upload: the client streams the file through a SHA-256 hasher and sends the digest in X-Asset-SHA256; the server recomputes during ingest and rejects the upload on mismatch. Objects are then stored content-addressed by their SHA-256 (so identical content across paths/versions is deduped).
Download (pull / checkout): the client streams the response through a SHA-256 hasher and compares to the server's X-Asset-SHA256. If they disagree, the temp file is discarded and the command exits non-zero with an integrity: error. Use --no-verify to opt out (not recommended).
Release artifacts: make checksums and make dist write SHA256SUMS alongside binaries/tarballs; verify with make verify-checksums.

Versions

asset versions sprites/hero/idle.png
asset checkout sprites/hero/idle.png -v 3

checkout writes the chosen historical version into the working tree at the same path; the next push will publish a new version if you commit it.

Branching

The asset model is trunk-based by default — there is one always-shippable branch (main) and every repo starts there. For most game-dev teams that's the only branch you ever need: short edit cycles, frequent pushes, locks prevent stomping.

When you do need to diverge — a level redesign, an art pass, a risky refactor — the design is to support short-lived named branches rather than the heavyweight release/develop/hotfix model from Git Flow.

Recommended workflow

main is always releasable. Tag releases on main (v1.2.0) rather than cutting a long-lived release/* branch.
Branch off main for any change that takes more than a session or touches risky assets: feature/level-1-art, experiment/new-shader.
Keep branches short — hours to a few days. Long-lived branches cause painful merges (especially for binaries, which don't 3-way-merge).
Merge back into main with an explicit strategy per conflicting file (theirs / ours / manual), since binaries can't auto-merge.
For versioned releases (a shipped build of the game), prefer tags. Only cut a release/x.y branch if you need to hotfix a frozen build while main moves on.
Hide in-progress work behind game-side feature flags where you can — same reasoning as trunk-based dev in source control.

Native branching (design)

Status: design only, tracked in the roadmap. The current implementation behaves as if every repo has a single implicit main branch.

Because blobs are already content-addressed by SHA-256, branches are cheap: a branch is just a named (path → version) map that shares the global blob pool. Creating a branch copies a manifest, not bytes.

CLI surface:

asset branch                              # list branches in this repo
asset branch create feature/level-1-art   # branch from current HEAD
asset branch create feature/x --from=main
asset branch switch feature/level-1-art   # swap working-tree manifest
asset branch delete feature/x
asset merge feature/x --into=main --strategy=theirs

asset status, push, pull, checkout, versions, and history all become branch-scoped; the active branch is recorded in .asset/state.json and sent on every request.

Wire protocol additions (alongside the existing /v1 endpoints):

Method	Path	Purpose
GET	`/v1/repos/:repo/branches`	list branches
POST	`/v1/repos/:repo/branches`	create branch (body: `{name, from}`)
GET	`/v1/repos/:repo/branches/:name`	branch info + head map
DELETE	`/v1/repos/:repo/branches/:name`	delete branch (refuses `main`)
POST	`/v1/repos/:repo/branches/:name/merge`	merge into named target

Existing object/manifest/lock endpoints gain an optional ?branch= query (default main) so old clients keep working unchanged.

Storage / metadata changes:

New branches bucket in bolt: branch_name → {created_at, parent, parent_head_map} (the parent map lets diverged paths fall through to the parent branch's version until the child branch writes its own).
Per-branch head map: (repo, branch, path) → version.
The existing per-path version chain stays global — versions are still immutable and content-addressed, branches just pick which version is HEAD for that branch.
Locks become (repo, branch, prefix)-scoped, so a lock on main:sprites/ does not block feature/x:sprites/. Whole-repo locks still cover all branches (used for repo-level admin operations).

Merge semantics for binaries:

3-way text merge isn't meaningful for most game assets, so asset merge walks the divergent path set and applies an explicit strategy:

--strategy=theirs — source branch wins on every conflict.
--strategy=ours — target branch wins on every conflict.
--strategy=manual — prompts per conflicting path (or writes a conflict report and exits non-zero in CI).

The merge result is a single new version on the target branch for each chosen file — no merge commits, no octopus. Non-conflicting paths just fast-forward their head pointer to the source branch's version (zero bytes transferred, since the blob already exists).

Migration: existing repos are treated as if they were created on main. No data rewrite, no breaking change for clients that don't pass ?branch=.

Web UI

The asset-server bakes a small HTML admin UI into the same binary. Open http://<server>/ui/ (or the / root, which redirects) and sign in with either a configured bearer token or the contents of an Ed25519 private key file (.asset/keys/<identity>.key). On success the server mints an HttpOnly, SameSite=Lax session cookie — bytes don't leave the server.

What it lets you do:

list repos, browse the manifest by prefix, drill into a path's version history;
download a specific version (the UI redirects to a presigned URL when the backend supports it, otherwise streams through);
inspect active locks per repo and release a stuck one;
create a new repo.

All UI routes live under /ui/. Templates and CSS are embedded via embed.FS so the binary stays single-file. Sessions are kept in-process with a 24h TTL; if you scale out with state.backend: redis each replica keeps its own session table (re-login moves between replicas, the API keeps working unchanged).

The UI is opt-in by configuration: it only shows usable inputs for auth schemes that are actually configured. If auth.tokens is empty and auth.authorized_keys_file is unset the login page warns that nothing is configured and won't accept credentials.

HTTP API

All endpoints are under /v1. JSON bodies; bearer token in the Authorization header when tokens: is non-empty.

Method	Path	Purpose
GET	`/v1/health`	liveness
GET, POST	`/v1/repos`	list / create repos
GET	`/v1/repos/:repo`	repo info + stats
GET	`/v1/repos/:repo/manifest?prefix=...`	latest visible object per path
GET, POST	`/v1/repos/:repo/locks`	list / acquire
POST	`/v1/repos/:repo/locks/:id/heartbeat`	refresh TTL
DELETE	`/v1/repos/:repo/locks/:id`	release
GET, HEAD	`/v1/repos/:repo/objects/<path>?version=N`	download
PUT	`/v1/repos/:repo/objects/<path>`	upload (requires `X-Asset-Lock`)
DELETE	`/v1/repos/:repo/objects/<path>`	tombstone (requires `X-Asset-Lock`)
GET	`/v1/repos/:repo/versions/<path>`	full version history

Layout

cmd/
  asset-server/        # server entry point
  asset/               # CLI entry point
internal/
  proto/               # wire types shared by client + server
  server/              # HTTP, lock manager, repo wiring
    metadata/          # bolt-backed metadata store
    store/             # object store interface + local + s3
  client/              # client SDK: config, state, sync, leased lock
  web/                 # embedded HTML admin UI (mounted at /ui/)

Roadmap

Desktop app (wraps internal/client).
Chunked/resumable uploads for very large blobs.
Reference-counted blob GC on prune.
Optional CRDT-style merge for diverged text files.
Native branching — short-lived named branches sharing the global content-addressed blob pool. CLI (asset branch, asset merge), branch-scoped locks, and a ?branch= query on existing endpoints. See the Branching section for the full design.