Settings

Theme

Uplevel database development with DataSQRL: A compiler for the data layer

datasqrl.com

4 points by mbroecheler 3 years ago · 7 comments

Reader

nerpderp82 3 years ago

Is this similar in spirit to Noria?

https://github.com/mit-pdos/noria

  • mbroechelerOP 3 years ago

    Yes, the idea to maintain materialized views based on standing queries to make the queries instantaneous is the same. In addition, DataSQRL handles the ingest (e.g. consuming events off a queue, pre-processing the data, and populating the database) and egress (i.e. serving the data through an API) so that all your data logic can be in one place.

    Another key difference to Noria is that DataSQRL is an abstraction layer on top of existing technologies like Postgres, Flink, Kafka, etc and does not aim to be another datastore. That way, you can use the technologies you already trust without having to write the integration code.

    • nerpderp82 3 years ago

      This sounds wonderful! And it validates many of my own thoughts. :)

      Your product would align nicely with these DAG recomputation engines like Fluvio and Temporal (Seattle).

      Well Noria implements MySQL protocol, so if your system targets MySQL, you could run DataSQRL on Noria!

      • mbroechelerOP 3 years ago

        Exactly, there are so many amazing dataflow engines, stream processors, and databases out there. We are not competing with those.

        We are trying to "compile away" all of the data plumbing code you have to write to integrate those systems into your application, so that it becomes easier to use them.

        MySQL support in DataSQRL is definitely on the short-list.

        • nerpderp82 3 years ago

          You support JDBC, so JDBC->MySQL Protocol->Noria should work for some definition.

          My one minor nit is the creation of a new language. How does ChatGPT4 handle in reading it or writing it? It is possible to teach it a new language inside the prompt but you run out of context window.

          I am not being glib, but I mapped out pretty much this exact product. The crux of your success will be in the schema discovery and versioning your schema, data and flows in a way that be tractably upgraded and downgraded.

          • mbroechelerOP 3 years ago

            You are totally right. We did not want to create a new language and we are trying to keep it as close to SQL as possible. The problem is that SQL lacks streaming constructs you need for temporal joins or creating streams from relational tables. Jennifer Widom's group at Stanford did a lot of work on this (e.g. [1]). We are adding their operators to SQL in a way that is hopefully "easy enough". The rest is just syntactic sugar.

            But we are not tied to SQRL and totally open to ideas for making the language piece less of a hurdle.

            GPT4 is surprisingly good at writing SQRL scripts with few-shot learning.

            You are also right on the schema piece. We are trying to track schemas like dependencies in software engineering. So you can keep them in a repo and let a package manager + compiler handle schema compatibility and synchronization. https://dev.datasqrl.com/ is an early prototype of the repository idea.

            [1] Arasu, A., Babu, S., & Widom, J. (2006). The CQL continuous query language: semantic foundations and query execution. The VLDB Journal, 15, 121-142.

mbroechelerOP 3 years ago

We'd love for you to join us in building a high-level data development language to simplify data-driven application development.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection