Prompts for Cleaning and Processing Data with AI
subsystem.aiA blog on using using LLMs to clean, process, and enrich data. It includes prompts and code snippets. The post draws on my experiences and two really interesting papers:
- Can Foundation Models Wrangle Your Data? (https://arxiv.org/abs/2205.09911)
- Large Language Models as Data Preprocessors (https://arxiv.org/abs/2308.16361)
I cover:
- Error and Anomaly Detection
- Enriching Data with LLMs
- Matching Data Labels
- Identifying Matching Records
Thank you and I'd appreciate your feedback.