Show HN: Openlayer – Test, fix, and improve your ML models
openlayer.comHey HN, my name is Vikas, and my cofounders Rish, Gabe and I are building Openlayer: http://openlayer.com/
Openlayer is an ML testing, evaluation, and observability platform designed to help teams pinpoint and resolve issues in their models.
We were ML engineers experiencing the struggle that goes into properly evaluating models, making them robust to the myriad of unexpected edge cases they encounter in production, and understanding the reasons behind their mistakes. It was like playing an endless game of whack-a-mole with Jupyter notebooks and CSV files — fix one issue and another pops up. This shouldn’t be the case. Error analysis is vital to establishing guardrails for AI and ensuring fairness across model predictions.
Traditional software testing platforms are designed for deterministic systems, where a given input produces an expected output. Since ML models are probabilistic, testing them reliably has been a challenge. What sets Openlayer apart from other companies in the space is our end-to-end approach to tackling both pre- and post-deployment stages of the ML pipeline. This "shift-left" approach emphasizes the importance of thorough validation before you ship, rather than relying solely on monitoring after you deploy. Having a strong evaluation process pre-ship means fewer bugs for your users, shorter and more efficient dev-cycles, and lower chances of getting into a PR disaster or having to recall a model.
Openlayer provides ML teams and individuals with a suite of powerful tools to understand models and data beyond your typical metrics. The platform offers insights about the quality of your training and validation sets, the performance of your model across subpopulations of your data, and much more. Each of these insights can be turned into a “goal.” As you commit new versions of your models and data, you can see how your model progresses towards these goals, as you guard against regressions you may have otherwise not picked up on and continually raise the bar.
Here's a quick rundown of the Openlayer workflow:
1. Add a hook in your training / data ingestion pipeline to upload your data and model predictions to Openlayer via our API
2. Explore insights about your models and data and create goals around them [1]
3. Diagnose issues with the help of our platform, using powerful tools like explainability (e.g. SHAP values) to get actionable recommendations on how to improve
4. Track the progress over time towards your goals with our UI and API and create new ones to keep improving
We've got a free sandbox for you to try out the platform today! You can sign up here: https://app.openlayer.com/. We are also soon adding support for even more ML tasks, so please reach out if your use case is not supported and we can add you to a waitlist.
Give Openlayer a spin and join us in revolutionizing ML development for greater efficiency and success. Let us know what you think, or if you have any questions about Openlayer or model evaluation in general.
[1] A quick run-down of the categories of goals you can track:
- Integrity goals measure the quality of your validation and training sets
- Consistency goals guard against drift between your datasets
- Performance goals evaluate your model's performance across subpopulations of the data
- Robustness goals stress-test your model using synthetic data to uncover edge cases
- Fairness goals help you understand biases in your model on sensitive populations Tought this was about https://openlayers.org/, got confused for a moment. The naming is a bit to close to OpenLayers imo. Looks interesting though Ha, yeah it's definitely not the most ideal! I realize it is primitive today, but how is it eventually going to be different or better than aporia, arize, arthur, dioptra, evidently, gantry, graphsignal, guardrails, metaplane, superwise, uptrain, whylogs, etc. ? A comparison table on your site would help. Comparison table is a great idea - we will add. We noticed that the industry is laser-focused on tackling feature drift after production, but we spotted a gap. Most ML teams are wrestling with model validation even before anything is deployed in production. We also noticed that post-deployment analysis sometimes misses the mark, lacking components like identifying underperforming cohorts or giving actionable insights. This leads to a barrage of alerts and the inevitable alert fatigue. We decided to start Openlayer to offer a more holistic solution that helps teams from the ML development and experiment tracking phase to more advanced tasks like monitoring and fairness. We established a strong baseline with this launch and are now building several features on top. Stay tuned! Nice to see more data-centric platforms. One I found helpful for CV, NER and TC:
https://rungalileo.io smrt FYI you have a typo on your homepage. On one of the charts under “Solve efficiently” you misspelled salary as “salarry”. As a side note: your product is pretty cool :) Will fix, thanks for catching! Any real world examples? How does it work out for them? We work with both startups and enterprises across a range of task types! Some common ones are fraud and churn detection for financial institutions or e-commerce sites (both tabular classification examples). It's very important for these types of tasks in particular to guard against biases and false negatives, so they use us to set up wide test nets that help give them assurance that their models are working properly before they hit production (and to monitor them post-deployment). Another example is Zuma (https://www.getzuma.com), a startup building an AI-driven chatbot that uses us to track their experiments and improve the accuracy of their NLP intent classification model. Of course, we're also building out support for evaluating LLMs. Because this is an open problem, we've been spending a lot of time interviewing people in the space who are building these models (please reach out if this is you!).