Storage - Hugging Face

3 min read Original article ↗

Hugging Face
Storage Buckets Storage Buckets

Store models, datasets, and artifacts with simple per-TB pricing. Built-in CDN, Xet deduplication, and no git overhead.

Trusted by more
than 10,000 AI teams

Storage

Storage built for AI teams

Store models, datasets, and artifacts with simple per-TB pricing. Xet deduplication. Included CDN. No git overhead.

  • Per-TB pricing with built-in CDN and deduplication speedups.

  • No Git constraints: commit-free sync and fast object updates.

  • Designed for ML workflows: datasets, checkpoints, model artifacts.

Xet Technology

Next-gen large-scale storage for AI

Xet uses content-defined chunking to break files into byte-level chunks and deduplicates across your entire bucket. When you retrain a model and only 5% of weights change, only that 5% is re-uploaded.

  • Raw + processed dataset: stored once, billed once*

  • 4x less data per upload, verified with real-world workloads

*Requires Enterprise or Enterprise Plus plan

Pricing

Transparent, volume-based pricing

Simple per-TB pricing that scales with usage. Egress and CDN are included at no extra cost.

Data Storage

Assemble training data at any scale

Pour raw data from every source into a single bucket: crawls, annotations, synthetic outputs, partner datasets. No git overhead, no commit queues, no file-count limits. When training begins, your data is already there, streamed to GPUs via the included CDN.

  • Immediate availability on upload, no queued commits

  • Batch API processes thousands of files in a single call

  • Raw + processed datasets with dedup = no double billing*

*Requires Enterprise or Enterprise Plus plan

CDN

Built-in CDN for blazing fast access

Every bucket includes a CDN. Warm localized cache close to where you compute for ultra fast streaming and downloads. Egress is included up to a generous 8:1 ratio of your total storage.

  • Pre-warm cache in any cloud region you need

  • Our CDN is deployed inside GCP and AWS networks

  • Egress included up to 8:1 your storage

More providers coming soon

Coding Agents

Give your coding agents persistent storage

Coding agents run in ephemeral environments, but their outputs shouldn't vanish. Checkpoints, benchmark results, generated datasets: one hf sync command in your agent's bash tool is all it takes.

  • Pre-warmed CDN and no git overhead for fast reads and writes

  • Persist artifacts across ephemeral CI runs and terminal sessions

  • Install the official HF CLI skill and your agent knows every command

AES-256 Encryption

End-to-end encryption at rest and in transit

Audit Logs

Full visibility into every access event

SSO & RBAC

Enterprise SSO with role-based access control

US & EU Regions

Choose where your data lives

SOC 2 Type II | GDPR Compliant

Get started with HF Storage Buckets HF Storage Buckets

Start with buckets, sync your AI data, and unlock object storage built for ML workflows.