Stanford DAWN

A Five-Year Research Project to Democratize AI

Stanford DAWN (Data Analytics for What’s Next) · 2018–2022

Making a machine learning system is a complicated process — but with better tools, we believe any organization could do it.

Better tools are needed

With better data management tools, the process would become easier. The DAWN project set out to research and build these tools. Our vision is that anyone with expertise in their domain — such as a medical lab optimizing clinical procedures or a business group addressing its field-specific problems — can build their own production-quality data products without requiring a team of experts in machine learning.

“It’s hard in grad school to find a project that pulls together so many different collaborators. It was a really cool team, both from industry and grad students. It was really fun rather than the typical grad school solo-journey student experience. I feel grateful about that.”
— Firas Abuzaid

DAWN Leadership

People

DAWN addresses every step of the ML production process

Today it is easier than ever to choose, adjust, and train machine learning models — the core algorithms that learn from data to produce the desired results. But a model can only do its job if people have gathered a lot of good data for it to learn from, and it can only be useful if people make it widely available and monitor its output for errors. DAWN aimed to make all these steps easier, streamlining the process from beginning to end.

Collecting and preparing data

One of the greatest challenges is to acquire or produce enough data in the first place. Many ML models require huge amounts of training data, and the data often have to be cleaned of errors and labeled with additional information. These tasks often need to be done by hand.

Training and running the model

Thanks to years of ML research, the models and algorithms themselves are often good enough out of the box. The main challenge here is one that affects every step in the process: running systems quickly-and cost-effectively, when many ML applications are constructed from disparate parts that weren’t designed to work together efficiently.