Show HN: CSVFiddle – Query CSV files with DuckDB in the browser
csvfiddle.ioHey HN,
I made CSVFiddle because I wanted a quick way to query CSV files with SQL and share the results with other people.
The app runs 100% in-browser, so the data you import and the queries you write are never sent to a web server. When you share the URL to a workspace, all of its queries and references to CSV files are just encoded in the URL fragment.
In-browser querying is made possible by DuckDB-Wasm, which has been an awesome project to work with:
https://duckdb.org/2021/10/29/duckdb-wasm.html
There are definitely limitations with CSVFiddle (e.g. sometimes the auto-parsing feature doesn't accurately interpret the imported files), but so far it's been useful for a range of data tasks.
Some demo workspaces you can check out:
University Students by State https://tinyurl.com/6k35anth
Uber Pickups in NYC https://tinyurl.com/5n8av39h
Do you have an option to use a .csv from url, e.g.: https://csvfiddle.io/?url=https://foo.com/bar.csv
If not, could you implement it?
I ask because I'm working on a web-based file manager (https://filerion.com/)
One of the feature ideas I have is letting people to view CSV files.
I don't want to implement my own csv viewer and would rather integrate with tools like csvfiddle.
I.e. the user would right-click on a .csv file in my file manager, one of the options would be "View in CSVFiddle".
When chosen, I would create publicly visible, CORS-compatible url for the .csv file (so that you can fetch() it) and launch cvsfiddle.io?url=<url> in a new window.
Does it work with really large files? Like, >100mb or so. I was considering making something similar but with sqlite.js [1], but the problem with it is that it loads everything in memory, so I wasn't entirely sure how it will deal with larger workloads.
This sounds like a workaround to your problem:
By no means am I crapping on what you've created — it looks great, and I've always wanted to try DuckDB and now you've made a frictionless entrypoint — just wanted to point out in general that querying CSV with SQL is more accessible than some people might have assumed. e.g. here's a recent TIL blogpost from Simon Willison about him discovering how to do sqlite queries against CSV from the command line: https://til.simonwillison.net/sqlite/one-line-csv-operations
One suggestion I would make: the Uber trips data is interesting, but might be too big for this demo? I was getting a few loading errors when trying it (didn't investigate where in the process the bottleneck was though)
A more appropriate comparison here might be to my Datasette Lite project, which runs SQLite in the browser using WASM and lets you join multiple CSV files by URL: https://simonwillison.net/2022/Jun/20/datasette-lite-csvs/
I think CSVFiddle is a fantastic addition to the ecosystem: making DuckDB more accessible - especially in a browser - is a very useful thing.
On HN: "One-liner for running queries against CSV files with SQLite" https://news.ycombinator.com/item?id=31824030
I have no problem with sqlite, in fact I really like it but that seems like it could be quite a hefty "one-liner".
Not to discourage you or anything, but Observable seems to cover this without too much hassle:
https://observablehq.com/@cmudig/introducing-sql-with-duckdb
I recently streamed firebase into duckDB for realtime exploratory analytics in the browser (on Observablehq)
This is great.
Will definitely be using this.
I wouldn't worry too much about people focused on running from the command line.
I love the command line. But not as a query interface for quick investigation of CSV data.
I've been wanting something like this for a long time!
Excited to give it a try.
I guess if it uses duckdb it can query parquet (and duckdb) files too?
very useful tools!