Range joins in DuckDB
duckdb.orgDuckDB has become my preferred tool for hardcore data wrangling. Excel is fine for like 80% of data processing tasks, but the remaining twenty percent are a pain, especially when you're CPU bound on a remote desktop. Smuggling the DuckDB JDBC driver onto said remote machine was the most productive infosec violation I've ever committed.
I don't know why I waited so long to try it.
I wrangle a ton of raw and aggregate data locally every day. I've had a 10-year habit of massaging via unix CLI tools and pipes then moving to excel. I guess I didn't wanna write code. Funny thing is I love SQL.
But with `duckdb_cli` it's a game-changer. I'm truly truly impressed.
> I've had a 10-year habit of massaging via unix CLI tools and pipes then moving to excel.
Have you, at any point, considered / used dgsh [0] or a similar tool? If so, how has been your experience with it?
Interesting. Never tried dgsh before, but yea it looks like my poor-man's CLI workflow for a lot of tasks
Our lips are sealed.
I've been beating my head trying to get duckdb to statically link into a Go program (I'm neither an expert with cgo nor ld). If anyone else has been able to do this I'd love to see your build steps.
https://github.com/marcboeker/go-duckdb produces a non-static binary by default.
I'm not familiar with the project. Does it use any net-related code? That won't be static because it will want to load C-libs for using /etc/nsswitch.conf to handle DNS/name stuff.
https://stackoverflow.com/questions/33228809/why-is-my-go-ap...
I don't have the source code in a good state to publish yet but here's where I'm at. At some point before this CGO_LDFLAGS does work and the header is found (omit the -ldflags args). But when it goes to statically link it can no longer find the header.
Edit, nevermind about not being in a good state! Here's my code: https://github.com/multiprocessio/duckdb-tests.CGO_LDFLAGS="-L$(pwd)/duckdb/src/include" CGO_CFLAGS="-I$(pwd)/duckdb/src/include" go build -ldflags '-extldflags " -lstdc++ -lm -lduckdb -static"' # github.com/marcboeker/go-duckdb ../../go/pkg/mod/github.com/marcboeker/go-duckdb@v0.0.0-20220427142532-cd9f33e64d9a/connection.go:4:10: fatal error: duckdb.h: No such file or directory 4 | #include <duckdb.h> | ^~~~~~~~~~ compilation terminated.Put the file in quotes. Angle brackets are for built-in files. #include "duckdb.h"
That's not my code.
But also, just to double check, I modified the vendored code and no difference:
CGO_LDFLAGS="-L$(pwd)/duckdb/src/include" CGO_CFLAGS="-I$(pwd)/duckdb/src/include" go build -ldflags '-extldflags " -lstdc++ -lm -lduckdb -static"' # github.com/marcboeker/go-duckdb vendor/github.com/marcboeker/go-duckdb/connection.go:4:10: fatal error: duckdb.h: No such file or directory 4 | #include "duckdb.h" | ^~~~~~~~~~ compilation terminated.
I tried to replace SQLite with DuckDB for a customized install of better-sqlite3[1] and failed.
We have a node client if that would be helpful! https://duckdb.org/docs/api/nodejs
I tried the same thing, also failed… I am also not an expert however. But I am very interested in this. Anyone reading this that could point me to some resources that might help?
Did you try Zig? https://dev.to/kristoff/zig-makes-go-cross-compilation-just-...
Per this post [0] by Andrew Kelley, Zig's lead developer, projects with "large dependency trees" are better off using other tools than rely on Zig's cross-compile magic.
DuckDB needs Python3 to build as well, so not sure how easy it might be to get it cross-compile with Zig CC.
Also, the issue isn't cross compiling it's just static linking.
Gotcha. Targeting musl instead of glibc with Zig CC should get you a statically-linked binary, though, unsure if duckdb and its deps play nice with musl.
Personally, a duckdb golang binary interests me. But: I haven't yet mustered enough patience to sit through a time-consuming duckdb build.
Great to see new features being implemented. I'm using DuckDB for a thesis project and integrating it into my own Python CLI/web tool has been super easy -- I especially love the direct integration with DataFrames, it makes things really seamless.
I've been using DuckDb a fair bit recently and really enjoy it... When it has slightly better ide support (eg, I can use it in pycharm) and can take in geospatial data, I'll be ecstatic.
Well I feel silly, based on a slight mis-reading of the title, I totally thought that Range was some company that was acquired by DuckDB.
Oh dear I can see that - sorry for the confusion! I'll see if we can come up with something a bit longer. It was a bit nerdy...
I also thought that DuckDB had acquired a company named Range. Interesting article regardless!
I was thinking there was some 10x programmer known only as Range that had joined.