DTable a new distributed table implementation in Julia using Dagger.jl
julialang.orgIt's a shame that JuliaDB was basically abandoned. I work in the financial industry, and I could see Julia competing with KDB+. Unfortunately, the Julia data engineering stack is far behind the data science stack.
It is certainly a shame, but I'm confident that Dagger and its new DTable should be able to cover all of the ground that JuliaDB covers, while being far easier to maintain. I think JuliaDB had some great ideas, but it didn't go far enough with composability, instead opting to use a limited set of table types (no internal DataFrames.jl support), fully focusing on loading from CSV (which is a horrible data format, albeit very common), and supporting only one CSV reader/writer (CSVFiles.jl). Of course, all of this could get fixed; but with JuliaComputing no longer funding its direct development, and no one dedicating the large portions of time necessary to fix all the outstanding issues and begin developing and merging features, JuliaDB isn't moving anywhere fast.
Thankfully, Dagger is under active maintenance, and has financial support through the JuliaLab (by employing me). Krystian Guliński, the DTable's author and maintainer, is also interested in developing and maintaining the DTable further (having created it as part of his schooling), and will hopefully stay on the Dagger team for the foreseeable future.
Have you thought about interfacing with DuckDB for out of core processing?
Firstly, I'll say that we already have work started to implement out-of-core directly in Dagger: https://github.com/JuliaParallel/Dagger.jl/pull/289.
With that PR in place, it should be possible to define a "storage device" which is backed by a database. I haven't had a chance to actually try this, since the PR still needs quite some work and testing, but it's definitely something on my radar!
20x faster than Dask is pretty good! I hope this becomes production ready.