From Inspiration to Action
When we were still dreaming up lakeFS, one of the projects that inspired us was DVC (Data Version Control). It was one of those moments when you realize – “Ah, others see it too.” We weren’t alone in believing that data should be managed like code.
DVC was built by data scientists for data scientists, solving a real pain: how to bring reproducibility and collaboration to data and model experiments. At the same time, we were approaching the problem from a different angle – from the world of big data systems and large-scale algorithms. Where DVC was empowering individuals and small teams, lakeFS was designed for massive data infrastructure and enterprise-scale workflows.
From the beginning, our projects were different – but our visions were deeply aligned.
Over time, I got to know Dmitry and Ivan, the founders of Iterative.ai. We had so many energizing conversations about how the data ecosystem was evolving. Every chat felt like a mini brainstorming session – two teams with different lenses, both trying to make data more reproducible, reliable, and accessible.
In 2024, Dmitry and Ivan started exploring an exciting new direction for Iterative, centered around unstructured data analytics and processing, which led to the creation of the DataChain. As their focus shifted, maintaining DVC became less central to their roadmap. When we talked about it, something clicked.
At lakeFS, we see data version control as foundational – the bedrock of any data architecture that aims to be trustworthy, scalable, and ready for AI. So, when the opportunity arose to take DVC under our wing, the decision felt natural. We knew we wanted to keep DVC thriving and ensure the community around it continues to grow and innovate.
Today, I’m thrilled to share that lakeFS has acquired DVC.
Two Sides of the Same Coin
This partnership feels almost inevitable. DVC and lakeFS have always complemented each other beautifully.
DVC is perfect for small teams and individual data scientists – it’s lightweight, flexible, and powerful for experimentation and research. It helps people organize, version, and reproduce their work in a simple, elegant way.
lakeFS, on the other hand, is all about scale and resilience. It brings those same principles – versioning, branching, reproducibility – to petabyte-scale datasets and enterprise AI infrastructure. By bringing DVC and lakeFS together, we’re able to serve the entire spectrum of data teams – from the first data scientist exploring model ideas to the largest enterprises running mission-critical AI pipelines. Under one roof, the story becomes complete: a seamless path from small data to big data, from exploration to production.
Bringing Communities Together
What excites me most about this acquisition isn’t just the technology – it’s the communities behind it.
The DVC community is full of incredibly creative data scientists who have been experimenting, innovating, and sharing ideas for years. They’ve pioneered ways of thinking about data versioning and reproducibility that have influenced so many in the field.
The lakeFS community is made up of engineers and platform teams operating at scale – people building the foundations for enterprise AI, compliance, and reliability.
Now, we get to bring these two vibrant, complementary communities together.
I can’t wait to see what happens when DVC users – who know how to move fast, explore, and experiment – connect with lakeFS users, who know how to scale, secure, and operationalize data. There’s so much each side can learn from the other.
DVC users can inspire enterprise teams to think more creatively about flexibility and iteration, while lakeFS users can share hard-won lessons about scaling data practices without losing control or governance.
Together, they’ll shape the next chapter of data version control – not as separate projects, but as a unified movement.
A Natural Growth Path for the Future
As more organizations embrace AI, they’re realizing that data version control isn’t optional – it’s essential. Without it, it’s impossible to reproduce experiments, trust results, or manage data responsibly at scale.
We’ve seen many teams begin their journey with DVC. It’s the perfect starting point for exploratory work – lightweight, intuitive, and ideal for early-stage machine learning projects. But as these projects grow, the data grows with them. The need for scalability, compliance, and reliability soon follows.
That’s where lakeFS comes in.
With DVC and lakeFS together, teams now have a natural growth path:
- Start small, learn the principles of data versioning with DVC.
- Then, when your datasets reach terabytes and petabytes, and your workflows demand enterprise-grade capabilities – move to lakeFS without losing the foundation you’ve built.
It’s a continuum, from the first commit on a local laptop to full-scale AI infrastructure in the cloud.
Looking Ahead
This acquisition is a celebration of shared purpose. DVC and lakeFS were born from the same idea, shaped in different environments, and now reunited to continue that mission together.
I want to thank Dmitry, Ivan, and the entire Iterative.ai team for their incredible work and vision. DVC has empowered thousands of data scientists around the world, and we’re honored to carry that torch forward.
To the DVC community: we’re here to support you, listen to you, and keep the project thriving.
To the lakeFS community: get ready to meet some amazing new collaborators.
This is a moment of convergence—and celebration. The future of data version control is brighter than ever.
