Settings

Theme

Apache Pegasus - A horizontally scalable, high-performance key-value store

github.com

110 points by indogooner 3 years ago · 40 comments

Reader

seeekr 3 years ago

Note that this seems to be a relatively old project, first commits from 2015. The project seems active, but most of the work seems to have been done around its inception, with some significant activity from 2020 onwards. Speculation/interpretation: So this might be a project that was used internally by some company, but perhaps not any more, and they've decided to open-source it at some point (2017-2018?) because some folks were/are still excited about it and want to keep developing it.

This might explain some of the "why yet another RocksDB-based KV store?" line of questioning.

  • pas 3 years ago

    > yet another RocksDB-based KV store

    Aaah, there was a super informative talk about the different databases at Facebook, most of them built on RocksDB, with different trade offs. (And I can't find the video :((((( )

    Anyway, it makes sense to have yet another it if serves a different purpose. Eg. for read-heavy workloads (caches, serving user feeds, whatever), or write-heavy (monitoring, storing that sweet sweet tracking juice that then gets read once or twice while building the recommendation models), small or large blobs, latency requirements, HA/consistency requirements, how complicated queries are going to be, does it support secondary indices or not, etc.

  • l2dy 3 years ago

    Pegasus was open-sourced by Xiaomi, and is still used internally according to https://apachecon.com/acasia2022/sessions/ai-1125.html.

    Source: https://www.zhihu.com/question/66719537/answer/245270169 (in Chinese)

  • xani_ 3 years ago

    Apache foundation is retirement home for projects so it checks out

    • studmuffin650 3 years ago

      I wouldn't say thats accurate. Theres many super successful and active apache projects still. To name a few:

      Kafka, Cassandra, Zookeeper, Spark, Tomcat, Superset, Storm, Lucene,Log4j2, Hadoop, etc. The list goes on, but I would safely say that a majority of the world's systems run on Apache projects which are for the most part actively developed

      • asjo 3 years ago

        And arguably the venerable Apache HTTP Server · https://httpd.apache.org/ :-)

        • ramesh31 3 years ago

          Despite all the modern alternatives, I'd bet my life that a majority (>50%) of the web is still being served by httpd.

      • mekster 3 years ago

        Not sure about others but tools like Superset have plenty of better alternatives like Grafana and Metabase.

        So when there are better alternatives, it's not too inaccurate.

tmikaeld 3 years ago

Will be interesting to see the benchmarks!

There's a lot of KV engines that uses RocksDB now, like CockroachDB (Forked into PebbleDB though), YugabyteDB and TiDB.

Those are all many times slower than Redis though, so having a middle-ground aimed to be similar to Redis, that doesn't eat all RAM, is very exciting!

  • hknmtt 3 years ago

    because you are comparing in-memory store with permanent storage. many kvdb can be run from memory if needed.

    • jamesrr39 3 years ago

      Technically, all of them can, if you put the "persisted" storage in a directory on a ramfs or tmpfs mount :)

      This might seem like a bit of a facetious comment, but it does have genuine use cases, e.g. unit tests, or for data that you can easily restore after a shutdown.

  • c0balt 3 years ago

    Isn't TiDB built on top of TiKV?[0]

    [0]: https://github.com/pingcap/tidb

  • yla92 3 years ago

    > Those are all many times slower than Redis though

    I'd appreciate if there any links/doc that I could look into to learn more about this?

    • ddorian43 3 years ago
      • xwowsersx 3 years ago

        Thanks for the link. In that post, he writes "Note: when we talk about disk controller we actually mean the caching performed by the controller or the disk itself. In environments where durability is important system administrators usually disable this layer of caching." I'm not sure I completely understand that note. Can you illuminate?

        • karmakaze 3 years ago

          That's probably in cases where the disk controller can't guarantee that accepted writes will make it to the disk in the case of power failure. I worked with bare metal servers that had redundant power supplies including to the caching/raid disk controller and disks that in the event of power loss could still guarantee that writes that were 'sync'd by software were written after power loss.

          • xwowsersx 3 years ago

            Ah got it, thanks. What a fantastic read btw, his writing is incredibly concise and clear. I have to binge some of his other writing.

endisneigh 3 years ago

Be curious to see differences between this and FoundationDB.

scary-size 3 years ago

Is it me or are most of the docs only available in Chinese?

spockz 3 years ago

Why yet another key value store?

  • sidcool 3 years ago

    KV stores are a complex topic and research continues. Each new tool comes with its own set of trade offs. It would help to go through the documentation.

    • jstummbillig 3 years ago

      > Each new tool comes with its own set of trade offs

      Is the cognitive load this produces still worth the consideration? At what level do you have to operate for the gains to actually make viable business sense to even consider?

      Sometimes I am thinking "Well, surely at Google level" – and then I load up one of their interfaces, for example Google Ads, and I have to sit around for 10s before anything even shows up.

      • sidcool 3 years ago

        There are so many startups in the KV DB space that it's absolutely worth it. Now if one is a consultancy, they can very well used any available key value stores. But beyond a certain scale, it's very useful to try and customize a KV store. That's where it becomes useful.

      • prox 3 years ago

        Or not at all like last week. Screens would take forever! It was better the next day luckily.

    • ramraj07 3 years ago

      You could do a million other interesting things academically or commercially before diving into yet another KV store though.

      • robertlagrant 3 years ago

        Hard to know what the point is of this line of reasoning. Someone or some company thought it was interesting, so they made it.

hk1337 3 years ago

This seems very interesting and I am peaked but the documentation and web page is lacking a lot to tell me what it is and how it's intended to be used. I know it's a key value store and it's supposed to be fast but that's it.

hasperdi 3 years ago

Anyone using this in production? If so, how do you find it, is it good?

pknerd 3 years ago

Another key/Value store system. There are already like RocksDB, LevelDB and many others!

truth_seeker 3 years ago

why would I choose it over TiKV or RIAK ?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection