MLflow v0.8.0 Features Improved Experiment UI and Deployment Tools

51 points by dmatrix 7 years ago · 15 comments

Reader

m_ke 7 years ago

I'm looking into switching over to using MLflow or Polyaxon for experiment management and tracking. We currently us a a custom built django app for experiment tracking and run experiments by hand on desktop workstations but we're starting to move some of that over to GCP.

For people who have used either of the projects, what are your opinions and are there any hidden issues that you ran into?

Ideally we'd like to have a platform that makes it easy to schedule runs on the desktops or GCP depending on requirements and available resources. Seems like kubernetes might be the best option for that and it doesn't look like MLflow supports it out of the box yet.

sailingparrot 7 years ago

Polyaxon is really great in term of functionality and UX. It's still pretty early stage, so there are some bugs, but overall I am very impressed by it. We have been using for a few month now with a couples of ML researchers.
MostlyAmiable 7 years ago

My main issues are that if you're using the serving functionality, the containers it builds take a long time to start because the environment/dependencies are loaded at runtime instead of being baked into the image. Also, it doesn't have the ability to use a db or remote file store to save experiment info, so you need to use EBS volumes or something for persistence.
mateiz 7 years ago

While MLflow doesn't submit jobs to Kubernetes for you, it should be possible to integrate it with your favorite scheduler to do that. MLflow is designed to accept experiment results from wherever you are running your code, so you can just submit an "mlflow run ..." command to Kubernetes and have it report results to your tracking server.
manojlds 7 years ago

We use it for experiment tracking and model repository in our CI/CD flow. More details on our approach - https://stacktoheap.com/blog/2018/11/19/mlflow-model-reposit...

antisocial 7 years ago

We are evaluating MLflow. I would like to know if there are any plans for making this an Apache project?

mlthoughts2018 7 years ago

As an ML engineer, I’ve found MLFlow to be really a disastrously bad way to look at the problem. It’s something that managers or executives buy into without understanding it, and my team of engineers (myself included) have hated it.

There are many feature specific reasons, but the biggest thing is that reproduction of experiments needs to be synonymous with code review and the identically same version control system you use for other code or projects.

This way reproducibility is a genuine constraint on deployment and deployment of an experiment, whether just training a toy model, incorporating new data, or really launching a live experiment, is conditional on reproducibility and code review of the code, settings, runtime configs, etc., that embodies it totally.

This is much better solved with containers, so that both runtime details and software details are located in the same branch / change set, and a full runtime artifact like a container can be built from them.

Then deployment is just whatever production deployment already is, usually some CI tool that explains where a container (built from a PR of your experiment branch for example) is deployed to run, along with whatever monitoring or probe tracking tools you already use.

You can treat experiments just like any other deployable artifact, and monitor their health or progress exactly the same.

Once you think of it this way, you realize that tools like ML Flow are categorically the wrong tool for the job, almost by definition, and they exist mostly just to foster vendor lock-in or support reliance on some commercial entity, in this case Databricks.

mateiz 7 years ago

Don't MLflow Projects exactly meet this use case? A project lives in a Git repo, which can include both code and data, and specifies its software environment (currently Conda but will eventually also support Docker): https://www.mlflow.org/docs/latest/projects.html. You can then run it wherever you want to run code: CI system, Kubernetes, cloud, etc. The reason MLflow doesn't force people to use Projects is because many users like to develop ML in notebooks, but we definitely expect engineering teams to use it with Projects.
- mlthoughts2018 7 years ago
  
  I could go on at length about why MLFlow / Databricks understanding of ML projects is bad to a bonkers degree. I’ll give just one example, which has mattered considerably for several production projects my team works on and tried to manage in ML Flow for a while.
  The project was a suite of neural network models that provided face & object detection results in a low-latency web interface where customers can manipulate photos and want automated metadata about people or objects.
  In our case, to optimize for performance we need to frequently experiment with compile-time details of the runtime environment (in our case a container) where the application will run in production.
  So the axis of our experiments wasnot usually anything to do with neural network layers or data or parameters. It was different compiler optimization flags, different precision approximations and GPU settings that needed to be rolled into a huge number of different underlying runtime environments, and then for each distinct runtime environment the more mundane experiments would be carried out for layer topology, number of neurons, width of CNN filters, etc.
  We found that unless youbasically build your own entire “meta” version of ML Flow that wraps around ML Flow, then it falls apart at use cases where custom compile time details of the runtime are themselves aspects of the experiment. Not to mention that the Projects formatting violates good practices, like 12 Factor stuff, for how to inject settings from the environment, which again leads to wasted effort making special case deployment handling for ML Flow jobs.
  Whatever deploys and measures your tasks should not also impose any type of special case packaging structure, which is a big reason why MLFlow conceptually fails. Any attempt to make anything at all like a DSL packaging layer for experiments that causes it to diverge from “regular deployment of any old job” is immediately a failed idea. The only thing it’s good for is creating unwitting vendor lock-in once you’re highly dependent on this bespoke, weird packaging template for Projects that makes your ML jobs weirdly (and needlessly) different from other deployment tasks.
gidim 7 years ago

Runs and experiments are not 1:1 mapped. A single container run could generate multiple experiments such as with the case of parameter search. Additionaly traditional tools for version control are not well suited for ML results and exploration. That said code is still a big piece of the puzzle. Our approach at Comet.ml is to snapshot everything whether it runs on a container or not and tie that back to git.
- mlthoughts2018 7 years ago
  
  This is still perfectly synonymous with regular build tools, like running a rebuild in Jenkins or ‘build with parameters.’ The point is to treat builds and runs of an experiment setup exactly the same, with the same tooling, monitoring, data capturing, etc., as any other deployed program. There is nothing special about a one-off job that trains a model or computes an experimental result compared with jobs that perform an experiment on database tuning or test load on a web service or any other type of deployed job. You have monitoring and probing of key stats and health of the experiment, you can reproduce the exact run or the same run with modified parameters, and the run produces output artifacts or writes data. It’s all perfectly the same.
  Basically if someone shows me a supposed ML experiment tracking system, the first question is, “If I replace the phrase ‘ML experiment’ with ‘generic computing task’, does the tool still handle everything exactly the same?”
  If not, it’s a failed idea, because you’re trying to break model training or tuning jobs out of the regular deployment model and you’re not using consistent tooling to manage deployment of experiment runs and all other types of “jobs” that you can “run.”
  - gidim 7 years ago
    
    Sure you can reuse tools to achieve similar results. As with everything else the devil is in the details. Does your monitoring system saves results forever or it only let you report 90 days back? Can you compare two runs in a meaningful way? i.e not just logs but also interactively plotting exploring your results? Do you need to spend hours to instrument your code? Can you sort jenkins job by a parameter/metric? What about reporting new results to an existing experiment? There's many more examples. But in any case if you can reuse your CI/CD system for ML experiment management you should do that. Another question worth considering is that if this is a "failed idea" why would engineering led tech companies build these systems? Obviously they tried reusing their current tooling.
    The tools we've been building for the past fifty years were designed for software engineering. Machine learning workflows are different in many ways and as such require new tools and approachs. That's at least our perspective.
    
    mlthoughts2018 7 years ago
    
    Literally all the example cases you mention are also needed when comparing results for database tuning, load balancing, A/B testing, etc. etc. None of those asks would differentiate ML projects from any other type of general project. So unless you plan to shoe-horn non-ML projects into an upstart system purportedly for ML projects, you’re just wasting resources (usually egregiously) by using a different tool. Even just thinking ML problems are different somehow is usually already a sign that you’re investing in ML in a way that is very unlikely to map to project success.
m_ke 7 years ago

How is it forcing code review on you?
I do agree that having things tied to a commit might not be ideal if you're running a lot of experiments in a large shared codebase.
I've been tempted to use git to version my model runs but always avoid it because it's usually just extra work.
- mlthoughts2018 7 years ago
  
  I think you flipped my comment around. I’m saying that the number one defining requirement of model reproducibility tooling is that it does force version control / code review.
  It should force the concept of “running an experiment” to be just another instance of a deployment. Any part of running an experiment that happens outside of the scope of that, such as with “mlflow run ...” for example, is immediately violating the most basic property of the whole thing (I guess unless “mlflow run ...” is hacked to perform actual production deployments of all types of programs).

Settings

MLflow v0.8.0 Features Improved Experiment UI and Deployment Tools

Keyboard Shortcuts