Settings

Theme

Show HN: Spice.ai – materialize, accelerate, and query SQL data from any source

github.com

177 points by lukekim 2 years ago · 56 comments · 2 min read

Reader

Hi HN, We're Luke and Phillip, and we're building Spice.ai OSS - a lightweight, portable runtime, built in Rust and powered by Apache DataFusion to locally materialize, accelerate, and query data tables sourced from any database, data warehouse or data lake.

Phillip and I first introduced Spice on Show HN in September 2021. Since then, we’ve been schooled and humbled in every way building 100TB+ data and ML systems for the https://spice.ai cloud platform. Along with our customers, we struggled with getting fast, low-latency, high-concurrency SQL query within a budget, accessing and combining data from many sources, trade-offs between OLTP/OLAP compute engines, and managing datasets as code.

Today, we’re re-launching Spice, completely rebuilt from the ground up, to directly solve several of the problems we had in accessing data quickly and cost-effectively providing it to applications, dashboards, and machine learning. Spice provides federated SQL query across databases (MySQL, PostgreSQL, etc.), data warehouses (Snowflake, BigQuery, etc.) and data lakes (S3, MinIO, Databricks, etc.) with the ability to materialize remote datasets locally using in-memory Arrow, DuckDB, SQLite, or PostgreSQL. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.

You can read the full announcement blog post at https://blog.spiceai.org/posts/2024/03/28/adding-spice-the-n....

We’d appreciate it if you check Spice out, give us feedback, and if you'd like to contribute, we'd love to build with you.

Thanks!

GitHub: https://github.com/spiceai/spiceai

lmeyerov 2 years ago

Any sense of comparison to Dremio, which helped steward the Arrow ecosystem for doing this kind of thing?

(The idea is great fwiw, I've been following them one-off for years, and we have to do elements of these things in how we build louie.ai and Graphistry for the GPU equivalent. Real pain point!)

  • lukekimOP 2 years ago

    Dremio is awesome. We've followed the Dremio journey from one of Jacques' original talks a couple of years back. Dremio's idea of caching tiers and reflections is powerful for performance.

    Spice takes it further and provides flexibility for materialization, giving you full control over where that materialization exists (same machine, same pod, same network, same cluster, same region, etc.), what engine/processing (OLTP - SQLite/PostgreSQL, OLAP - DuckDB/Arrow) it uses and what tier (in-memory, attached NVMe, etc.) to store it down to the dataset level.

imgdesgen 2 years ago

That's awesome! I'll definitely give it a try if there's a suitable scenario.

alex_hirner 2 years ago

Looks great! Is flightsql supported over the wire too, so one could hook it up to grafana? Any plans to support iceberg?

nextworddev 2 years ago

Hey guys - how does this compare to cube?

  • phillip-spice 2 years ago

    I'm not too familiar with https://cube.dev/ - but my initial impression is they are focused more on providing APIs backed by SQL. They have a SQL API that emulates the PostgreSQL wire protocol, whereas Spice implements Arrow and Flight SQL natively. Their pre-aggregations are a similar concept to Spice's data accelerators. It also looks like they have their own query language, whereas Spice is native SQL as well.

gerenuk 2 years ago

Interesting one. Any plans for clickhouse data connector?

neeleshs 2 years ago

Congratulations. Is this similar to Trino/Starburst, Drill?

  • lukekimOP 2 years ago

    Thank you!

    Yes, in terms of federated queries, there are similarities, but Spice is designed to be much smaller, faster, and lightweight (single-binary, 140MB) so you can run it next to your application as a sidecar, or eventually even in the browser. Spice also gives you more options and flexibility for materialization, so you can choose where and how to store local materialized data.

dvdsgl 2 years ago

Congrats on the launch! This is exciting. The video demo is awesome: https://youtu.be/AZyrecVWnEs?si=j7JVKhhcUor1_y-f

cedrone 2 years ago

Congrats Luke & Phillip– exciting day!

prabhatsharma 2 years ago

Do you support subqueries and joins?

  • lukekimOP 2 years ago

    Spice supports what DataFusion supports, which is generally yes but there is still work to do to push down more queries to TableProviders. For example, joins within a single source are not yet pushed down to the underlying provider.

    You can write a single query across many data sources which is what we show in the demo on the Git repo.

leeholim 2 years ago

Congratulations on the launch!!

jjustin_lawson 2 years ago

Congrats on the launch team!

dwgray 2 years ago

This looks great - I've been meaning to dig into Rust - seems like a solid choice for you.

watsondoc 2 years ago

Wow, looks promising

martinmao 2 years ago

This looks awesome!

marooned4 2 years ago

looks great . Going to try this out

alamb 2 years ago

So great to see another project built on DataFusion @!

mritchie712 2 years ago

Very cool!

One thing to keep in mind:

DuckDB can directly query parquet files (and many other file types[1]), mysql, postgres[0], and SQLite. So if you're in need of something like this, DuckDB on it's own might work for your use case.

0 - https://duckdb.org/docs/extensions/postgres

1 - https://twitter.com/thisritchie/status/1767922982046015840

  • lukekimOP 2 years ago

    Yes, we're huge fans of DuckDB, Mark, Hannes and the team.

    What we've found is sometimes you want to materialize data in an OTLP DB, so what Spice gives you is the choice to store some datasets in DuckDB and some in something like SQLite/PostgreSQL and join them together in a single SQL query, so you can get the best of both worlds.

    • riku_iki 2 years ago

      DuckDB can both read/write to PG. What exactly usecase you are unlocking?..

      • lukekimOP 2 years ago

        DuckDB is awesome. As an OLAP columnar-store database it excels at certain operations, like aggregations. If your use-case is row-based lookups where an OLTP database would perform better, you now get a choice of engine, while still having a single place to access your data from your app.

        Originally, we only supported DuckDB in our cloud product Spice Firecache, but actually lost a customer because their use-case was optimized for an OLTP DB. Now, you can get a choice... down to the dataset level and still be able to join across them in a single query. With Spice, you can load both SQLite and DuckDB together in the same process for local materialization and acceleration.

        Finally, Spice OSS does more than just data query. You can read about the vision to power AI-driven applications by co-locating data with models at https://docs.spiceai.org/intelligent-applications.

        • riku_iki 2 years ago

          > If your use-case is row-based lookups where an OLTP database would perform better, you now get a choice of engine, while still having a single place to access your data from your app.

          my understanding is if you run some SQL in DuckDB against PG using extension, say select * from t where id = 2; it will perform actual lookup on PG server but results will be accessible in DuckDB.

          > With Spice, you can load both SQLite and DuckDB together in the same process for local materialization and acceleration.

          you can do this in any Py or Java or C++ or whatever program..

          • lukekimOP 2 years ago

            You're right, and that might be a good choice if you wanted to deploy and operate an additional PostgreSQL server locally.

            ## Using DuckDB:

            app -> duckdb -> network -> remote postgres (data) | local postgres (materialization)

            ## Using Spice:

            app -> localhost gRPC/HTTP -> [Spice <duckdb|sqlite>] -> network -> [postgres|S3|snowflake|etc]

            In addition, Spice manages the materialization for you. In the DuckDB-only case, you'd have to do a COPY FROM [remote postgres] to [local postgres] manually every time, and manage the data lifecycle yourself. That gets even more complicated if you want to do append or incremental updates of data to your local materialization.

      • phillip-spice 2 years ago

        DuckDB is an in-process DB similar to SQLite - so every application in your stack would need to embed it. Spice is a binary that has Flight SQL and HTTP query endpoints - so multiple applications can connect to it from any language.

ignoramous 2 years ago

> Today, we're re-launching Spice...

  Obtaining blockchain and smart-contract data is hard ... Spice makes it easy.
http://web.archive.org/web/20220414105622/https://docs.spice...

A slight detour from the company's original vision (https://archive.is/88IoQ)?

CyberDildonics 2 years ago

There are eight different accounts in this thread "congratulating the launch" with their first comment. Half were created six hours ago right when this was posted.

https://news.ycombinator.com/user?id=martinmao https://news.ycombinator.com/user?id=dwgray https://news.ycombinator.com/user?id=dennispan https://news.ycombinator.com/user?id=peycke https://news.ycombinator.com/user?id=watsondoc https://news.ycombinator.com/user?id=leeholim https://news.ycombinator.com/user?id=cedrone https://news.ycombinator.com/user?id=jjustin_lawson

  • sneilan1 2 years ago

    My first thought when I saw your post pointing out the congratulations comments was one of the ending scenes from Neon Genesis Evangelion where they say congratulations repeatedly.

    https://youtu.be/oyFQVZ2h0V8?si=oOYSIjVmpJK6mwft

    Regardless, this is very spammy marketing.

  • pvg 2 years ago

    If you think someone is posting abusively, email the mods. You've seen the thing about not-posting shillage insinuations in the site guidelines.

    • CyberDildonics 2 years ago

      I posted a straight fact with the links to prove it. You are making that connection from the information I gave you.

      • pvg 2 years ago

        https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

        It's an accusation of abuse. Those go to hn@ycombinator.com, not in the threads where they are meta noise.

        • CyberDildonics 2 years ago

          I posted true, verifiable information about this thread (that a lot of people voted up). I didn't accuse anyone of anything, people are smart, they can make up their own minds.

          • pvg 2 years ago

            This isn't some complicated thing, there's a site guideline specifically about it and you should try to stick to it like everyone else because it trashes the forum. You can just mail this stuff in.

            • riku_iki 2 years ago

              You can mail to mods too if you think something is wrong with parent comments?

  • zachmu 2 years ago

    People have friends who want to support them

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection