Settings

Theme

Ask HN: Local tools for working with LLM datasets?

2 points by platypii 8 days ago · 0 comments · 1 min read


I’ve been doing data science for years, and am very familiar with jupyter notebooks and more recently been using duckdb a lot. But now I have this huge pile of output tokens from my 4090s, and it feels characteristically different from data I’ve worked with in the past. Notebooks and duckdb on the CLI don’t feel like they’re built for working with huge volumes of text data like my training set and llm output traces.

What have you found work well for this? I’m trying to fine-tune on a text dataset and be able to inspect the output from eval runs. I would prefer local and open source tools to a paid service.

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection