Torus: A Toolkit for Docker-First Data Science
medium.comThis is interesting! It sounds like this v1 gets your local environment up and running in a Docker container. I maintain something similar for analysts on my team, and we've seen success in terms of decreasing time spent on environment setup.
As another interesting use of Docker in the data space, I'm excited about Pachyderm [0] (though I haven't had the chance to use it in production). In particular, the data provenance story seems compelling.
Thanks for the plug saamm, I'm one of the creators of Pachyderm. I think Torus and Pachyderm would work very nicely together. You could go straight from developing code in the image Torus provides to deploying it on Pachyderm as a production pipeline that runs on new data as it comes in with just a few commands. Similarly, their Dockerized data science cookie-cutter could work nicely as a Pachyderm service, this would work similar to using the service on your laptop, except that you could easily deploy it on a cloud provider and schedule it with GPUs and it will get updated with new data as it comes in.
Very exciting to see more people applying containers to data science.
Yes to containers! We are trying to make it as seamless as possible to be Docker first in all things. And not reinvent the devops wheel. It just needs to be adapted for the needs ot data scientists. Pachyderm is really cool. I will have to check it out. We've recently moved to Airflow for all our pipeline management... how does Pachyderm fit in that ecosystem?
Pachyderm's pipeline system covers much of the same functionality as Airflow's so there's generally not much reason to use both.
No to confuse with this other company at https://manifold.io/ (io, not ai) which deprecated their https://www.torus.sh/ project :)
Or for that matter, the short lived distributed storage system from CoreOS.
https://coreos.com/blog/torus-distributed-storage-by-coreos....
Yes, we found that super confusing thing later. I guess if you name your company Manifold you will name your projects after specific types of manifolds. We have Torus, next is Mobius, then ... Klein Bottle?
I think a more interesting direction would be for jupyter lab to ship an electron app and have it able to understand how to spin up and talk to containerized kernels.
I made a hacky version for work that proxies to a k8s pod but first class support would be cool.
https://github.com/jupyter-incubator/enterprise_gateway to launch kernels on a cluster and https://github.com/jupyter-incubator/nb2kg to make a notebook server aware of them might be of interest (sans electron app shell).
Thanks for sharing this! I didn't realize this project existed.
I have lots of questions now, like why this isn't using the zeromq based protocol, so I guess I will need to spend some time with it.
It does look like it closely overlaps with what I was describing. I didn't realize overriding/extending the http api was even a thing that could be done so I just used zeromq for my own purposes :)
Is cookie cutter running inside the docker container or on the host? The instructions imply setting up python, virtualenv, pip and cookie cutter all on the local machine...
Isn't this was Pachyderm was supposed to do?