Lilac: Analyze, structure, and clean unstructured data with AI
lilacml.comLilac co-creator here :)
Lilac is an open-source tool that enables AI practitioners to see and quantify their datasets.
Lilac allows users to:
- Browse datasets with unstructured data.
- Enrich unstructured fields with structured metadata using Lilac Signals, for instance near-duplicate and personal information detection. Structured metadata allows us to compute statistics, find problematic slices, and eventually measure changes over time.
- Create and refine Lilac Concepts which are customizable AI models that can be used to find and score text that matches a concept you may have in your mind.
- Download the results of the enrichment for downstream applications.
Out of the box, Lilac comes with a set of generally useful Signals and Concepts, however this list is not exhaustive and we will continue to work with the OSS community to continue to add more useful enrichments.
Check out the demo on HuggingFace: https://lilacai-lilac.hf.space/ Find us on GitHub: https://github.com/lilacai/lilac
I really like the tooltips when you hover over the text. Exploring the imdb database is a useful example.