We put 1M files into DVC, Git-LFS, and Oxen.ai
docs.oxen.aiOxen is awesome. Been having a lot of fun with your model inference tool. any time line on new models + which models will you guys be adding next?
Hey all, If you haven't seen the Oxen project yet, we have been building an open source unstructured data version control tool.
We were inspired by the idea of making large machine learning datasets living & breathing assets that people can collaborate on, rather than the static ones of the past. Lately we have been working hard on optimizing the underlying Merkle Trees and data structures with in Oxen.ai and just released v0.19.4 which provides a bunch of performance upgrades and stability to the internal APIs.
To put it all to the test, we decided to benchmark the tool on the 1 million+ images in the classic ImageNet dataset.
The TLDR is Oxen.ai is faster than raw uploads to S3, 13x faster than git-lfs, and 5x faster than DVC. The full breakdown can be found here.
https://docs.oxen.ai/features/performance
If you are in the ML/AI community, or rust aficionados, would love to get your feedback on both the tool and the codebase. We would love some community contribution when it comes to different storage backends and integrations into other data tools.
Have you measured the difference in speed in moving data to the GPU? For enterprise AI workflows this is a bottleneck to utilization, so improved speed can help reduce compute costs.
Great thought. Right now we are optimizing for moving data from machine A to machine B, but getting data to the GPU is interesting. We're on it.