Settings

Theme

csvkit: Command-line tools for working with CSV

csvkit.readthedocs.io

4 points by mxgr 3 years ago · 2 comments

Reader

mattewong 3 years ago

I wanted so much to use csvkit and all the features it had, but its horrendous performance made it unscalable and therefore the more I used it, the more technical debt I accumulated.

This was one of the reasons I wrote zsv (https://github.com/liquidaty/zsv). Maybe csvkit could incorporate the zsv engine and we could get the best of both worlds?

Examples (using majestic million csv):

---

csvcut -c 1,3 = 5.3 seconds

zsv select -n -- 1 3 = 0.19 seconds

28x faster

---

csvsql --query "select count(*) from file" file.csv = 148 seconds

zsv sql "select count(*) from data" file.csv = 0.68 seconds

216x faster

---

hermitcrab 3 years ago

It is interesting how much different tools vary in their performance for the same task. For example R with data.table is much faster than base R. And Excel Power Query performance is, well, see for yourself:

https://www.easydatatransform.com/data_wrangling_etl_tools.h...

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection