Show HN: Text data browser for NLP, LLM researchers and developers
github.comI created an app to easily browse and analyze large text datasets (local or remote). The app supports many data formats including JSONL and HuggingFace. Key features include:
Intuitive Navigation: Effortlessly browse local (or remote) data in HuggingFace, JSONL, etc., formats. Efficient Browsing: Stream large local (or remote) datasets without loading (or downloading) in memory. Powerful Analysis: Easily filter and sort data for better insights. Pretty-Print Code: Human-friendly visualization of code embedded in your data.
Package lives here - https://github.com/nihaljn/datahawk and welcomes contributions !
Setup and usage are very simple: `pip install datahawk; datahawk -p $port`
No comments yet.