Show HN: A real world streaming data generator in Python
github.comI've built GlassGen to solve the common problem of generating real time synthetic data for testing, demos, and ML datasets. While Faker is great for individual data points, GlassGen adds:
- Configurable data publishing (CSV, Kafka, Webhooks)
- Precise rate control (records/second)
- Controlled data duplication
- Extensible architecture for custom generators and sinks
Key features:
- Built on top of Faker for reliable data generation
- Simple JSON/YAML configuration
- Support for complex data relationships
- Real-time data streaming to Kafka
- Custom sink implementations
GitHub: https://github.com/glassflow/glassgen
Docs: https://glassgen.glassflow.dev/
Would love feedback from the community, especially on:
1. Additional sink types that would be useful
2. Performance optimization opportunities
3. Ideas for handling more complex data relationships
No comments yet.