Settings

Theme

Databases in 2021: A Year in Review

ottertune.com

323 points by jameslao 4 years ago · 135 comments (133 loaded)

Reader

why-el 4 years ago

Postgres's dominance is well deserved, of course. My only concerns with it, both are actively worked on, are bloat management (significant for update heavy workloads and programmers used to the MySQL model of rollback segments) and the scaling of concurrency (going over 500 connections). Bloat was taken over by Cybertec[1] after stalling for a bit and is funded (yay), while concurrency was also enhanced out of Microsoft [2]. All in all, an excellent future for our beloved Postgres.

[1] https://github.com/cybertec-postgresql/zheap [2] https://techcommunity.microsoft.com/t5/azure-database-for-po...

  • newlisp 4 years ago

    Another concern, no temporal tables, don't businesses demand this feature?

    • sa46 4 years ago

      In Postgres land, I think most businesses work around temporal tables with audit tables using triggers to dump jsonb or hstore. I wrote up how I used table-inheritance here [1].

      I agree with your point. Postgres is starting to stick out compared to alternatives:

      - MS SQL supports uni-temporal tables using system time.

      - Snowflake has time travel which acts like temporal tables but with a limited retention window. Seems more like a restore mechanism.

      - MariaDB has system-versioned tables (doesn't look like it's in MySQL).

      - Cockroach DB has uni-temporal support with system time but limited to the garbage collection period. The docs indicate you don't want a long garbage collection period since all versions are stored in a single range.

      - Oracle seems to have the best temporal support with their flashback tech. But it's hard to read between the lines to figure out what it actually does.

      [1]: https://news.ycombinator.com/item?id=29010446

    • manigandham 4 years ago

      I've never seen a business actually use them, large or small. Any auditing requirements are usually fed from other sources, like Kafka event streams, files on S3, or a OLAP data warehouse.

      • code_biologist 4 years ago

        How do you set up and feed the warehouse? Temporal-ish tables have been an obvious, simple, and mostly foolproof solution for many of our historical analytics and reporting needs.

        Bitemporal stuff (enabling edited versions of history) is where things get hairy and I definitely question the utility outside of a dedicated use case.

        • manigandham 4 years ago

          Most databases have built-in CDC (change data capture) that can be exported. Otherwise the WAL logs can be read with other tooling.

          Debezium is a great open-source product for streaming changes from many relational databases: https://debezium.io

    • code_biologist 4 years ago

      Would love to see wider support for temporal tables, but application level approaches like https://github.com/jazzband/django-simple-history have worked for the business issues I have.

    • p_l 4 years ago

      Interestingly enough, Postgres used to have time travel in tables before MVCC transactions were added. Apparently it wasn't exactly used feature.

    • roenxi 4 years ago

      Although temporal tables are a really good idea; it is possible to get away without them being a first class feature. They aren't hard to mimic if you can give up the guarantee of catching every detail. In an ideal world (ha ha, silly thought) the tables would be designed to be append-only anyway, or the amount of data would be significant. Both of which make temporal tables somewhat moot.

      • tpetry 4 years ago

        They are really easy to mimic in PostgreSQL with range types (tstzrange) and an exclusion constraint, so now overlapping values are allowed. I guess they will not add it to the core if a developer can add support to them so easy.

    • ivank 4 years ago

      I use https://github.com/xocolatl/periods for this to some success.

    • srcreigh 4 years ago

      What do temporal tables do that good queries don't?

      • InsaneOstrich 4 years ago

        Auditing of changes. We have to have a second table that stores history for any table that may need to be audited in the future

    • nightpool 4 years ago

      i’ve very rarely found that using a full temporal table is the right choice for online analysis—a dedicated schema serves you better in the long run and helps you design your indexes, etc appropriately. For compliance, PIT backups via WAL shipping should suffice, no?

  • lenkite 4 years ago

    I wish Postgres was more SQL standards compliant. Stuff like using `nextval()` instead of `NEXT VALUE` in SQL sequences is a pain.

  • nicoburns 4 years ago

    Is zheap definitely still an active project? Last commit seems to be Oct 2020

  • srcreigh 4 years ago

    Clustered indexes?

zffr 4 years ago

The author is a professor at CMU who specializes in databases: https://www.cs.cmu.edu/~pavlo/

Not completely related, but his lectures on databases on YouTube are really good. Much better than the DB class I had at college.

thejosh 4 years ago

I'm really excited by all the database love in the last few years. I moved to PG from MySQL in 2014 and don't regret it since.

Timescaledb looks very exciting, as it's "just" a PG extension, but their compression work looks great. [0]

I'm also really loving clickhouse, but haven't deployed that to production yet (haven't had the need to yet, almost did for an apache arrow reading thing, but didn't end up using arrow). They do some amazing things there, and the work they do is crazy impressive and fast. Reading their changelog they power through things.

[0] https://docs.timescale.com/timescaledb/latest/how-to-guides/...

threeseed 4 years ago

So a company that sells PostgreSQL services thinks PostgreSQL is dominating. Brilliant.

The reality is that nothing is dominating. In 2021 there were more databases than ever each addressing a different use case. Companies don't have just one EDW they will have dozens even hundreds of siloed data stores. Startups will start with one for everything, then split out auth, user analytics, telemetry etc

There is no evidence of any consolidation in the market. And definitely not some mass trend towards PostgreSQL.

  • Sytten 4 years ago

    Couple of points:

    1. Ottertune doesn't sell PostgreSQL services, they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

    2. PostgreSQL is definitely gaining market shares and fast, see the db-engine graph [1], you can compare it to the oracle trend if you are not convinced [2]

    [1] https://db-engines.com/en/ranking_trend/system/PostgreSQL

    [2] https://db-engines.com/en/ranking_trend/system/Oracle

    • srcreigh 4 years ago

      Is this ranking by # of orgs using Postgres, or relative total company value using Postgres, or some even more ambiguous effectiveness metric?

      Answer: https://db-engines.com/en/ranking_definition

      • srcreigh 4 years ago

        A DB system that works for professionals and doesn't require any public ecosystem of training materials won't be mentioned much in public.

        • y4mi 4 years ago

          thats likely why they're also including mentions in job postings in their metric

          > Number of job offers, in which the system is mentioned

          its not a silver bullet, but I do think its at least somewhat representative of popularity.

    • newlisp 4 years ago

      they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

      A ML program that automatically tunes your production database in real-time. What could possibly go wrong?

      • apavlo 4 years ago

        We are very careful to make sure that we don't allow the tuning algorithms to make changes that could be detrimental to the correctness or availability of the database. This blog article describes some of the safeguards that we employ:

        https://ottertune.com/blog/prevent-machine-learning-from-wre...

        We also advise our customers to not point OtterTune at a production database right away.

    • threeseed 4 years ago

      You can't just compare graphs like that without factoring in the cloud. PostgreSQL is a first-class, cloud managed, supported database in the top three cloud providers whereas Oracle is not. It's a massive impediment to adoption and is in no way a reflection of the database itself.

      Either way nothing to suggest that PostgreSQL is any way dominating.

      • onphonenow 4 years ago

        Postgresql is also well supported as a managed offering for PAAS offerings as well.

        Heroku has https://www.heroku.com/postgres

        Fly.io - https://fly.io/docs/reference/postgres/

        and lots more of these pretty small players - that still drive adoption.

        Then it's very well support on AWS / GCP / Azure

        So postgresql is just crushing it in terms of adoption.

        I honestly have not seen major Oracle offering in a bit.

        Looking at what tech companies are building on is oracle a major player these days. They used to be THE pretty much only player - those days feel gone by now.

      • nightpool 4 years ago

        Your own comment suggests that Postgres is dominating over Oracle, simply by saying that it’s been adopted as a major offering by the top 3 cloud providers. How is that not a reflection of the database?

dreyfan 4 years ago

All you need is Postgres (OLTP) and if you have large datasets where Postgres falls behind for analytical work, then you reach for Clickhouse (OLAP) for those features (while Postgres remains your primary operational database and source of truth).

  • mritchie712 4 years ago

    Agreed. I have a good bit of experience in SaaS and analytics and that's exactly what I landed on for building Luabase[0]. Postgres (specifically Supabase) for the app database, Clickhouse to run the analytics (which is the product).

    0 - https://luabase.com/

  • drchaim 4 years ago

    This is the way for my also.

czhu12 4 years ago

It's weird to put postgres into the same bucket as elastic search as they are often used for different things.

No matter how much you tune / denormalize postgres, you'll never get the free text search performance elastic search offers. Our best efforts on a 5 million row table yielded 600ms query times vs 30-60ms.

Similarity with snow flake, you'd never expect postgres to perform analytical queries at that scale.

I know graph databases and Time series DB have similar performance tradeoffs.

I think the most interesting and challenging area is how to architect a system uses many of these databases and keeps them eventually consistent without some bound.

  • code_biologist 4 years ago

    Not affiliated, but for anyone looking to do searches on data stored primarily in Postgres via Elastic, ZomboDB is pretty slick.

    ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes backed by Elasticsearch. https://github.com/zombodb/zombodb#readme

  • tpetry 4 years ago

    The author is talking about a different classes of rdbms. I believe his intention was not to compare PostgreSQL to ElasticSearch or ClickHouse which will solve a completely different problem.

    But for small to medium datasets his advice to just stick to PostgreSQL is good: Start with an easy solution which will give you anything you need (by simply installing a plugin). If you need more specialized software THEN use it, but don't start with an overcomplicated stack because ElasticSearch and ClickHouse may be the state-of-the-art open source solution to a specific problem.

  • zxcq544 4 years ago

    Have you tried GIN trigram(https://www.postgresql.org/docs/14/pgtrgm.html CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);) and GIN fulltext search indexes(CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);) ? As far as I know after applying those indexes on full text search columns you can search as fast as in Elastic because those indexes are built same way as in Elastic.

  • t-writescode 4 years ago

    How large are your text areas? What types of indexes are you using?

_vvhw 4 years ago

What are the distributed options for Postgres? What mechanisms are available to make it highly available i.e. with a distributed consensus protocol for strict serializability when failing over the primary? How do people typically deploy Postgres as a cluster?

1. Async replication tolerating data loss from slightly stale backup after a failover?

2. Sync replication tolerating downtime during manual failover?

3. Distributed consensus protocol for automated failover, high availability and no data loss, e.g. Viewstamped Replication, Paxos or Raft?

It seems like most managed service versions of databases such as Aurora, Timescale etc. are all doing option 3, but the open-source alternatives otherwise are still options 1 and 2?

  • ryanworl 4 years ago

    I think you'd still need to change the core of the database to avoid stale reads when an old primary and client are partitioned away from the new primary, or force all client communication through a proxy smart enough to contact a quorum of replicas to ensure the current primary is still the primary during transaction begin and commit.

    • _vvhw 4 years ago

      Ah yes, good point!

      I was assuming in both cases of manual failover that the operator would have to have some way of physically shutting down the old primary, then starting it again only as a backup that doesn't reply to clients. Alternatively, the cluster would need to remain unavailable if any node is partitioned.

      But none of this is really very practical when compared to a consensus protocol (or R/W quorums) and distributed database. I'm genuinely curious how people solve this with something like Postgres. Or is it perhaps something that isn't much worried about?

  • AtlasBarfed 4 years ago

    I can't see how #3 scales under any write load unless you have no joins.|

    Well, unless each node has a complete copy of the data?

eternalban 4 years ago

Databases are the best all around scratch every cs geek itch domain there is, with possible exception of operating systems.

The critical importance of extensibility as a primary concern of successful DB products needs to be highlighted. Realities of the domain dictate that product X matures a few years after inception, at which point the application patterns may have shifted. (Remember map-reduce?) If you pay attention, for example, you'll note that the du jour darlings are scrambling to claim fitness for ML (a subset of big-data), and the new comers are claiming to be "designed for ML".

Smart VC money should be on extensible players ..

SPBS 4 years ago

I genuinely couldn't tell if the author was being sarcastic when he said Larry Ellison was down on his luck because he dropped from 5th richest to 10th richest (and the whole thing about pulling himself out of the gutters by clawing up to 5th richest again).

  • apavlo 4 years ago

    I was not being sarcastic. Larry is a good man.

    • hodgesrm 4 years ago

      Larry Ellison is seriously underestimated as a database leader. I worked at Sybase. Oracle beat us fair and square. The Oracle DBMS team is outstanding.

      • zaphirplane 4 years ago

        Didn’t Microsoft take sybase’s customers

        • hodgesrm 4 years ago

          Not really--MS SQL Server was Windows only and took a while to grow. Oracle ran everywhere and was simply a better database by the mid-90s. (Previously Sybase was quite far ahead.) Oracle had row-level locking and MVCC at a time when Sybase was still stuck with a cumbersome page locking model. Oracle was also more reliable at least in my experience. I used to hit page corruption pretty regularly on Sybase but almost never on Oracle.

          Disclaimer: I worked at Sybase. It was an outstanding company in the early days.

  • fuy 4 years ago

    It obviously was sarcastic.

  • rafaele 4 years ago

    Seems like he's being serious and based on the linked tweet, I think he reveres Larry Ellison.

  • srini_reddy 4 years ago

    that's what I felt too. especially after the word "gutters". :)

sriku 4 years ago

I've been intrigued by dgraph (https://dgraph.io) and used it to good effect in a (toy) project where it felt easy to create and evolve it's data model given changing requirements.

Dgraph uses graphql as its native query language.

Anyone here has some experience to share on it? ... Since it isn't mentioned in the article.

divan 4 years ago

My DB discovery and the game changer of 2021 was EdgeDB.

  • PudgePacket 4 years ago

    Thanks for point it out, after a quick glance it actually looks like something I want to learn more about. Takes the niceties from prisma.io schema tooling and bring it closer to postgres.

  • girvo 4 years ago

    Oh wow. Thats what I've been looking for, for years at this point. Thanks for the shout-out, I know what I'm playing with for my next project!

    • Kinrany 4 years ago

      What did you like about it?

      • kbenson 4 years ago

        I can tell you what I liked about it when I looked, and that's that it seems to allow you to easily encapsulate what your intent as a programmer creating a record to store data is, as well to query it that way. I imagine it obviates some of the reasons reach for ORMs for certain programmatic database needs.

uvdn7 4 years ago

I agree with Andy that it’s just super fun to work on databases. You get to work on consensus, networking, compute, storage, etc. The workloads are always changing, you can try to optimize across the entire stack. Applications and workloads come and go, but databases will always be around.

tayo42 4 years ago

Wow I kind of feel like I'm reading about Javascript frameworks. I don't recognize any of the dbs or companies/projects. Didn't realize the db world was so busy

ttiurani 4 years ago

> Databases Are the Most Important Thing in My Life After My Family > I even broke up with a girlfriend once because of sloppy benchmark results.

I can't say I can relate, but I do appreciate being this passionate about things!

  • sigmonsays 4 years ago

    I really gotta go OT here and ask how this happened. Too funny.

    Professional lives should be separate from personal but please, indulge us with a story!

ransom1538 4 years ago

I am so confused. https://vitess.io/ I would check this page out and view it's "Who uses Vitess" section. Postgres is awesome if you are running a stand alone server with 300 users or creating the next "uber for cats". But at scale mysql has all the solutions. DBs are not js frameworks.

bsdnoob 4 years ago

I think PostgreSQL in an excellent general purpose solution specially for OLTP usecases but what it lacks behinds is that it's hard to scale horizontally (sharding). There are solutions for this ofcourse with citus but I haven't experimented with it however I have tried MySQL with Vitess which almost seems like dark wizardry. I hope one day vitess works with PostgreSQL.

FridgeSeal 4 years ago

From the article:

> Rockset joined in, saying its performance is was better for real-time analytics than the other two.

So I went and read the linked Rockset comparison blog post, and while I get that it’s a marketing piece, it’s also so transparently desperate for any advantage over Druid and ClickHouse that their criteria is bizarre at best, and bordering on wildly incorrect at worst.

I’ve been burnt by commercial databases before, and I have a hard time justifying ever using one, especially considering the advent of open source databases that have feature and performance parity (if not outright superiority) and can be self-hosted on K8s, or managed-hosting can be easily purchased.

  • mritchie712 4 years ago

    Altinity is doing a good job of this with Clickhouse. They offer some decent open source guides for self hosting[0] and offer a hosted option. The hosted option is as self serve as I'd like (you have to get "approved").

    0 - https://github.com/Altinity/clickhouse-operator and

    • FridgeSeal 4 years ago

      Yeah I’ve been paying attention to the Altinity stuff for a while, they’ve got some good stuff.

      I think we’ll get even more hosting options now that ClickHouse is it’s own backing company.

      • hodgesrm 4 years ago

        Thank you all for the very kind words about Altinity. We have always assumed that the ClickHouse market would be "crowded." By my count there are at least 7 cloud services based on ClickHouse. It's 8 if you include Firebolt, which embeds ClickHouse. There are even more hosting options on the way for ClickHouse, including clickhouse.com but also others. This is clearly going to be a competitive market with many outstanding alternatives for users.

        We have a bunch of ideas at Altinity about how to make ClickHouse even more pervasive. Stay tuned in 2022.

        Disclaimer: I am CEO of Altinity.

hu3 4 years ago

I expected more mentions of Vitess, which honestly looks like some kind of alien black magic from what I saw while consulting for a client this year.

But I guess not much else happened to it other than PlanetScale.

  • leetrout 4 years ago

    Which part most impressed you and which part seems like magic? Their devs / contributors are active on here...

    • hu3 4 years ago

      edit: not sure why the downvotes (-3 so far) since I just stated my experience on the project. There must be something blatantly wrong in what I wrote and I would appreciate criticism.

          --------
      
      An architect demoed the failure of a shard and the automatic promotion of its backup shard to main, in production. They actually test their failure models.

      As I see it, sharding is not very hard. HA is not very hard given a reasonable SLA. But sharding with HA on a large setup that actually works is pretty hard.

      Another thing that stuck in my mind was their high throughput-per-provisioned-hardware ratio. With not much hardware they were pulling 80k queries per second with room to spare.

      Although I have to say, that's not much compared to GitHub which pulls 1.2 million queries/sec on Vitess [0].

      [0] https://github.blog/2021-09-27-partitioning-githubs-relation...

      • samlambert 4 years ago

        I think its more about the HN community not wanting to hear that anything other than Postgres works.

  • skunkworker 4 years ago

    I would love to use Vitess, but it doesn’t support Postgres at the moment. And that’s a non-starter unfortunately.

  • ransom1538 4 years ago

    All major companies are moving to Vitess. The battle is over. No one at scale uses Postgres.

kaliszad 4 years ago

Does somebody have experience with XTDB https://xtdb.com/index.html ? We would like to use it in our Clojure application perhaps with PostgreSQL as the backend (JDBC) to make it easier to implement a history feature.

Looking forward, instead of backward, it would be great for databases to have some kind of live-patch/ live-update feature so that one does not need any downtime at all if some rules are obeyed (with an automatic check, if that is the case). The same is for operating systems, where we have parts of the technology and even some limited deployment, but nothing of it is the default as far as I know. This situation makes it quite a bit harder to develop and maintain systems without introducing extreme complexity. It does not look like we will have less bugs/ less patches any time soon so we should make updating as easy as possible to drastically reduce the need for a maintenance window without resorting to building clusters for everything.

hbarka 4 years ago

I’m genuinely happy with Redshift for data warehousing purposes. For this I mean not-transactional data store. I don’t want to use the term OLTP or OLAP as it puts it in a purist’s camp. Sometimes I store 3NF normalized data and many times a flattened denormalized very large fact table and often times a model similar to star schema. I don’t have to worry about building indexes anymore, which was a real chore with row-store databases like Oracle, MySQL, SQL Server, or PostgreSql. MPP column-store databases have really been a game-changer for the enterprise. We’re talking billions of rows of data easily handled in the query plan.

  • LunaSea 4 years ago

    The SQL version of Redshift is lagging so much behind that it makes it borderline unusable in my opinion.

  • hodgesrm 4 years ago

    I have always been a huge fan of Redshift, which extends to Anurag Gupta and the team that delivered it. Redshift has always struck me as one of the real breakthrough products the history of analytic databases. It collapsed deploying data warehouses from months to about 20 minutes.

    It's great to see the current team is on the move again, as the original ParAccel architecture did not scale very well. There was an excellent talk on Redshift in Andy Pavlo's Vaccination Database Tech Talks, 2nd Dose. [0] It's by Ippokratis Pandis and worth a view. It covers a lot of the recent improvements, which are likely to disappoint the many critics who have counted Redshift out. (Prematurely in my opinion.)

    [0] https://db.cs.cmu.edu/seminar2021-dose2/

leetrout 4 years ago

Excited to see Dgraph on the top 10 mentions and climbing above neo4j

  • slekker 4 years ago

    We are experimenting with Neo4j and found that the Cypher QL albeit foreign looking in the beginning feels quite natural to read when you think about graphs. How’s your experience with Dgraph been, any thoughts? I havent really heard about it before reading this post hence the curiosity!

criticaltinker 4 years ago

Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Does anyone else feel like a caveman when modeling a many to many relationship in a normalized schema, and then querying via SQL?

I’m surprised graph DBs aren’t more popular for this reason alone. Maybe it’s a far fetched dream, but perhaps a graph frontend can be slapped onto the Postgres backend.

  • apavlo 4 years ago

    > Databases in 2030: SQL DB finally succumbs to Graph DB as #1

    Graph databases will not overtake relational databases in 2030 by marketshare.

    Bookmark this comment. Reach out to me in 2030. If I'm wrong, I will replace my official CMU photo with one of me wearing a shirt that says "Graph Databases Are #1". I will use that photo until I retire, get fired, or a former student stabs me.

    • hodgesrm 4 years ago

      Count me in on Andy's side of the bet. The most useful features of graph databases will likely be subsumed into RDBMS just as features from JSON stores and object stores were before them.

      For example...One of the hits against RDBMS is that the structure is supposedly "rigid." That's simply not the case in many RDBMS, such as those using column storage. Adding columns in databases like ClickHouse is a trivial metadata operation. This means that many problems that Neo4j solves can be addressed in a more general-purpose RDBMS, because you can add columns easily to track relationships. It's pretty easy to envision other improvements to access methods to make searches more efficient.

      I don't mean to undercut in any way the innovation of graph databases. It's just that the relational model is (a) extremely general and (b) can be extended.

    • cam0 4 years ago

      Not a fan of graph dbs? Surprised the $325m round for Neo4j didn't make your funding paragraph.

      https://techcrunch.com/2021/06/17/neo4j-series-f/

  • PhoenixReborn 4 years ago

    Have you looked at Hasura for the second question (graph frontend + relational backend)? That's basically GraphQL on top of Postgres.

    As for the first question - I've tried using Neo4j and ArangoDB for relatively large-scale graph querying (1-2TB of data) and both couldn't hold a candle to Postgres or MySQL in terms of query performance for cost. Neo requires you to store most of your data in memory and Arango isn't great for cross-shard querying.

    Unless there's some major new graph DB that comes out in the next few years I would still bet on relational being dominant in 2030.

    • jbergens 4 years ago

      Have you tried TigerGraph?

      They say that they scale well. I have not tried any graphdb for prod work yet.

  • PDoyle 4 years ago

    Nonsense. Graph databases pre-date SQL. The relational model was created to overcome the limitations of graph databases.

  • srcreigh 4 years ago

    Relational data schemas are a graph

    • eurasiantiger 4 years ago

      And exactly for that reason, graph DBs can be more intuitive to work with: relational DBMSs generally don’t support any kind of graph operations or traversal queries.

  • option_greek 4 years ago

    You can always use an ORM which provides better usability for developers. End of the day rdbms model is suited for a wide variety of workloads and there are several other factors in play while choosing a good db including eco system, cloud vendor support, migrations, performance etc.

  • chishaku 4 years ago

    Which db is that?

    Either way, that’s not happening.

  • will_gottschalk 4 years ago

    I’ll take Hasura for 500

rapnie 4 years ago

Nice collection of open source databases: https://codeberg.org/yarmo/delightful-databases

jimmyed 4 years ago

Andy forgot about the ugliest spart around benchmarks: Yugabyte v Cockroach.

endisneigh 4 years ago

I wish there was some API that abstracted the DB and all technical details and you could connect nodes to it that are specific databases with specific capabilities and it would delegate as necessary.

  • Too 4 years ago

    Query language and data modeling for a db highly depends on if it is relational, graph, time series, denormalized or KV. Don't think this would be possible beyond what's already available in form of ORMs. Even getting SQL dialects to agree is a challenge some times.

  • srcreigh 4 years ago

    You're basically talking about Airflow/Airbyte/custom ETL + database expertise. The only way to get efficient performance is expertise, expertise is expensive, ETLs are a given when you have expertise... Just hire a DB consultant or two and you're all set.

  • mns06 4 years ago

    You might want to look into debezium. We use it to extract the change log from a generic OLTP database into Materialize, a view maintenance engine. Combining that data with event streams in Kafka is very powerful for us.

  • paulryanrogers 4 years ago

    FDW may be a step in that direction

  • vbezhenar 4 years ago

    ODBC, JDBC.

beamatronic 4 years ago

Not a word about Couchbase, which went IPO and is currently worth $1B

  • qaq 4 years ago

    There are so many Unicorn db companies now. It's hard to mention all of them

peakaboo 4 years ago

And no mention of Exasol which is faster than most, if not all, of these databases for analytics.

dblooman 4 years ago

ELI5, why do people still choose to use mongo?

  • redwood 4 years ago

    The continuous availability and horizontal scalability of the distributed system coupled with developer experience--document model, secondary indexes, all of that certainly has captivated a large and growing developer community… you could boil it down to a confluence of ease of use with the advanced capabilities that you may need if you are successful… still they would have petered out if it wasn't for Atlas which makes all of the above that much more accessible

  • menaerus 4 years ago

    Obvious, it's because MongoDB is web scale.

  • fullstackchris 4 years ago

    storing files

cloudengineer94 4 years ago

Postgress is amazing, however I work with SAP HANA every day and I gotta say this thing is completely insane.

  • tpetry 4 years ago

    Can you share some information about SAP HANA? Why is it insane? Insanely good or bad? I have no experience with it.

  • RedShift1 4 years ago

    Well it is hard to beat keeping everything in RAM...

closeparen 4 years ago

The world moved away from Hadoop and MapReduce… onto what?

  • carlineng 4 years ago

    Cloud SQL data warehouses like Snowflake and BigQuery.

    • throwDec21 4 years ago

      I just wish AWS had something as good as BigQuery.

      • throw_me_up 4 years ago

        There are companies that use BigQuery for analytics but their infra is in AWS. BigQuery has support for external tables to S3 now. The BigQuery transfer service can also move data from AWS pretty easily. I agree though, BigQuery is astonishingly good and makes both Snowflake and Redshift look like dinosaurs imo.

  • AtlasBarfed 4 years ago

    Throw a dart at the Apache project list :-)

    My god do we need an atlas of database related Apache projects.

    It's almost as bad as java web frameworks about ten years ago.

    Everyone can do everything and it's hard to know what is better for what.

    • FridgeSeal 4 years ago

      Add to this “Apache Streaming projects”.

      I get that projects can be donated to Apache from disparate sources, but my god it’s still a disaster.

throwDec21 4 years ago

I'm just surprised that in 2021 BigQuery isn't more popular. I thought it would be top 10 by now, I moved to GCP because of it but feel like I'm the only one.

  • PhoenixReborn 4 years ago

    Because BQ is great for the ETL/data warehouse/BI use case but is terrible for online applications. I tried using BQ as the backing store for an online analytics application back in late 2019, and it was so much worse than using Clickhouse/Druid/Pinot for the same use case. IDK how much that has changed since, but I'm not too terribly surprised that it isn't higher.

  • jbergens 4 years ago

    I'm not on GCP but the one I would like to try is Spanner or Cloud Spanner.

    I think more scalable systems will continue to gain market share. It will be interesting to see if PlanetScale, CockroachDb or some other actually becomes a big player.

  • RedShift1 4 years ago

    Bigquery seems like a tool for very large but static datasets. I also had a hard time figuring out the pricing so other than some test queries I moved on to other solutions.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection