Settings

Theme

Datafuse – Modern Real-Time Data Processing in Rust

github.com

80 points by implfuture 4 years ago · 14 comments

Reader

budabudimir 4 years ago

Anyone can optimize a database for trivial queries. Would be nice to at least see TPCH results or any other more complex benchmark.

  • BohuTANG 4 years ago

    Sure, datafuse is still working in progress. TPCH (mainly for JOIN) will be fully supported in Beta version. Datafuse team are mainly working on the Alpha version.

themaxdavitt 4 years ago

I almost mixed this up with Apache Arrow DataFusion for a second: https://github.com/apache/arrow-datafusion

gigatexal 4 years ago

Impressive and bold claims. I wish the team well. But I won’t tinker with it until it’s been jepsen tested and many core features on the roadmap have been finished.

caust1c 4 years ago

Curious what the motivation behind rebuilding it in rust is, versus contributing more to Clickhouse? Obviously memory safety is a big one, but is that the only reason?

What are the other goals of the project?

Personally, I'd love to see an easier-to-manage system with replication considered as a first-class feature rather than bolted on at the end.

  • BohuTANG 4 years ago

    Well. 1. With the improvement of the rust ecosystem, using rust has made database development faster and easy, for example datafuse use the tokio to implement the pipeline https://github.com/datafuselabs/datafuse/tree/master/fuseque....

    2. Couldn't agree with you more with easier-to-manage as a first-class feature, but some times easier-to-manage is built on stability, that's what datafuse is trying to do.

    • caust1c 4 years ago

      Awesome, great to hear! I've been using clickhouse for a long time and although we haven't contributed significantly to development, random bugs and issues have been quite painful in the past. Looking forward to what you're able to do!

      p.s. Please don't add in-process DNS caching ;-)

      https://github.com/ClickHouse/ClickHouse/issues/5287

neilsense 4 years ago

The problem with these queries is that they just aren't realistic in a production system. Over time the queries become more complex, include more edge-cases and cruft, and your main goal is that they complete with accuracy rather than if it was 5s or 50s.

threeseed 4 years ago

Would be good to know the background of this project, team etc.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection