Reducing complexity by integrating through the database

73 points by evanweaver 4 years ago · 44 comments

Reader

No matter how you integrate different applications, be it via APIs, messaging, or a database, it's vital to separate your application's internal data model from models which it exposes. If you don't do that, you're in for a never-ending story of upstream services unknowingly breaking downstream services, or upstream services not being able to evolve in any meaningful way.

So if they mean directly exposing a service's data model from the database to other services, I'm very skeptical. If they mean providing that access by means of some abstraction, e.g. database views, it can be an option in some cases.

You'll still loose lots of flexibility you'd gain by putting some middleware in between, e.g. ability to scale out compute independently from storage, ability to merge data from multiple sources into one API response, ability to implement arbitrarily complex business logic e.g. in Java, etc.

newaccount74 4 years ago

> it's vital to separate your application's internal data model from models which it exposes
It really depends.
If you have many different client services that need to access the database in a similar way, then you are right, it makes sense to add some type of abstraction. Then you can change the underlying model and only need to update the common logic once.
Example: if two services both need to create user accounts, it makes sense to encapsulate the logic for creating user accounts somewhere. (A common practice is to use stored procedures inside the database for that)
But if the services access the database in very different ways, then your abstraction may end up just making things more complicated. You'll have just as much work to update as if you updated all the services individually.
Example: An internal dashboard may need to access the data in unique ways. If you route the dashboard service through some middle layer, then that middle layer would have lots of APIs that are used only by the dashboard. So you gain nothing from the abstraction. Any change to the dashboard requires updating the middle layer, any change to the database also requires updating the middle layer. It's just as much effort to make changes as if there was no middle layer, you've just split the dashboard logic in two parts and made it harder to understand.
dennisy 4 years ago

I agree, but would you mind providing a concrete example of:
> No matter how you integrate different applications, be it via APIs, messaging, or a database, it's vital to separate your application's internal data model from models which it exposes.
Especially in respect to this product.
- ttymck 4 years ago
  
  Are you asking for an example of separating the data models or an example of what happens when you don't.
  This is a good example of what separation enables you to do:
  https://www.troyhunt.com/your-api-versioning-is-wrong-which-...
  (it is presented from the perspective of how to version an API, but the examples of what it looks like are there)
- treeman79 4 years ago
  
  A field you rely on changed form integer to string. Or goes away. Or was unique but a duplicate shows up. Or the database disappears for a few minutes every hour.
  S
- joshuanapoli 4 years ago
  
  Over time, the internal data model might need to change to more accurately model the world. For consistency and efficiency, we usually do not want to maintain the combination of the original low-fidelity model and new high-fidelity model within the internal database.
  The internal data model might need to be reorganized to improve the efficiency of part of the internal application.
  A client might not need a higher fidelity model. The internal reorganization of the data model might not be pertinent to the client's interface. So we normally have some data-mapping in the application to help provide a stable interface for clients.
  It's possible to providing these compatibility mappings within the database through views, but this is usually considered to be harder to control, test and scale.
  Maybe in a big application, the application-layer mapping and caching eventually get complicated enough to be something like a custom-made database. And so we might end up with an "integration database" but call it something different.
  - dennisy 4 years ago
    
    Great thanks.
    So we have a DB and a service in front of it. The DB gets a new schema, but the service maps to the previous representation in such a way that clients do not need to know about the new schema. Is that the gist?
    Then when it comes to using Fauna, what is it that does not allow such a data flow?
- dennisy 4 years ago
  
  Interesting read, but seems like a tangential topic?

nescioquid 4 years ago

While they quickly acknowledge that integrating by means of a database is widely regarded as an anti-pattern, the rest of the article doesn't really address why this wouldn't be an anti-pattern, other than to pretend that the major reasons for avoiding this pattern are deployment complexity and furnishing multi-region services.

> A customer pattern we see solves this problem, and it’s the integration database pattern. This pattern went out of style in the 90s along with Oracle servers in the closet and dedicated DBAs, but with technological advances like scalable multi-region transactions and serverless computing resources, its utility has returned.

Considering they are flogging a product, this feels especially dishonest to me.

thruflo22 4 years ago

Not related to the original article but I had a go at articulating the rationale here: https://hexdocs.pm/integratedb/intro.html
(This project isn’t ready for prime time but it was a sketch of an approach to mitigating the downsides of integrating through the database, in order to unlock the upsides).
mumblemumble 4 years ago

Also not the author, but I have seen some success here when the database is used as another way to build a service with an API, and not just a simple data store. What that means in practice is that applications interact exclusively through stored procedures, which serve as a sort of RPC API.
The big downside of this approach are that the human factors are tricky nowadays. Developers expect to be allowed to poke at the tables directly, and don't necessarily take kindly to alternative ways of framing things. Especially if they've been raised on a diet of code-first object relational mapping. And there's not really a great way to enforce this boundary, because, unlike HTTP, an ODBC session will generally let you pretty much anything you want to the remote service's implementation details.

rossdavidh 4 years ago

Having worked on old systems that used this approach, I recall that it did have a lot of pluses. Now, for the minuses:

1) changing the language you use in a server (say, from PHP to Python) is big, but changing the language you use in your database (from SQL to anything else) is even more intimidating. If you are integrating through the database, this limitation matters more.

2) you need to have DBA's who are not only highly competent, but also have good people skills, since they will often be saying "no", or at least "not that way", and if they don't know how to do that in a constructive manner then it becomes a net productivity drain. Fortunately, where I worked that used this pattern, the DBA's had exceptional people skills as well as technical skills. This is, I am led to believe, not always the case.

rc_hackernews 4 years ago

My first two programming jobs out of college back in the early 2010's took the approach in this article. Albeit, with older technology.

This is bringing back old (bad) memories of the times where I was debugging stored procedures that called triggers that called the same stored procedures.

A lot of that was due to poor design and bad choices. Some of that was due to developers trying to fit processes and patterns into a language, i.e. SQL, that lacked the expressiveness for it.

I'm not a fan of this approach to say the least.

I'd be interested in hearing other's experiences with this though.

And I'm going to check out Fauna since it looks like a cool database and to see if anything has changed with this approach since I encountered it almost a decade ago.

robsutter 4 years ago

I remember those days too - my sins were T-SQL and I remember those headaches quite clearly.
One huge difference today is that you can actually unit-test user-defined functions and stored procedures using the same tooling you use for your application code. That lends itself naturally to integration testing, and so on.
I wouldn't necessarily recommend taking it to the property-based testing extreme like I did just to see whether it worked (it does).
- chrisjc 4 years ago
  
  > you can actually unit-test user-defined functions and stored procedures using the same tooling you use for your application code
  Can you give an example of how this is possible, or what tools/services it's possible with? I can understand how it would work on UDFs, but not entirely sure how it would be possible for stored procedures, esp when they're often times doing something more complex than simple read operations.
  - robsutter 4 years ago
    
    Sure - here's an example using Jest to test the logic in UDFs: https://github.com/fauna-labs/fql-utilities/blob/main/tests/...
    You can use Jest's before[All] and after[All] to do any required setup and teardown per usual.
    For Fauna, UDFs and stored procedures are synonymous.
GordonS 4 years ago

I remember doing the same almost 30 years back, copying data from an Oracle database to an MSSQL database.
It was a fairly limiting approach, performance wasn't great, and diagnosing problems was a total PITA.
Afterwards I did an integration using an antiquated version of BizTalk, which was all based on COM and to this day remains my most loathed software ever. It was way too complex, the UI was crap, it was flakey as hell, and ridiculously slow. Later versions got better, but it was a very low bar.
Later I used a Java-based Apache project to do an integration - I forget the name. It was actually really good, easy to use but also allowing full control.
Later still, I did some integrations using custom C# code based around WWF (Windows Workflow Foundation) - the UI in Visual Studio was horrendously slow and flakey, it was too complex to build your own pipeline segments, and difficult to diagnose issues in production. But it did work fairly well for simple stuff.
And then later still I used PowerCenter - I was the architect, so wasn't that hands on, but the PowerCenter guys liked it, and we ended up adopting it as our standard integrations tool, with hundreds of integrations across the company.
- meepmorp 4 years ago
  
  Was it Apache Camel? I used that a while back and liked how low hassle it was.
  - GordonS 4 years ago
    
    Hmm, name doesn't sound familiar, it was a very long time ago tho.

contingencies 4 years ago

Use the right tool for the job. What is the most deployed database? SQLite3. In what pattern is it most often deployed? One language from one environment accessing one database with full read/write. Low call volume, high data complexity, and embedded (tightly application-coupled) use. This is observably the normal use case for a database and the simplest mode of implementation. Problems occur when software people start solving problems that don't exist: typically performance, future scalability, dubious theories of security and future language/database migrations. KISS.

taylodl 4 years ago

If I'm reading this correctly the crux of the argument appears to be if your serverless database engine provides cross-region data replication and consistency and provides a GraphQL interface then applications can use GraphQL to go directly to the database, thus solving a lot of the problems we endured in the past when using SQL to go to the database. It's an interesting idea that at first glance appears to be worth looking into further, though I still have a uneasy feeling because I remember all the pain in the past!

dennisy 4 years ago

The article is provocative!

But I do not see why the product itself is different to serverless DBs such as SupaBase and PlanetScale?

tlarkworthy 4 years ago

At Firebase it was sometimes called the client-database-server architecture. The pattern was documented in 2013 [1]

If you use Firebase as an ephemeral message bus it's a great pattern. It has problems if you use it like a traditional database because migrations are very tricky. DBs that support views (or GraphQL) can make migrations much easier

[1] https://firebase.googleblog.com/2013/03/where-does-firebase-...

evanweaverOP 4 years ago

Stored procedures and the integration database have come back for our users in a big way. It would be great to hear examples of how others are applying this pattern with other databases and APIs.

robsutter 4 years ago

My favorite thing about this pattern is that it allows developers to build globally distributed applications while reasoning about them like a single, centralized monolith.
A lot of complicated practices that we’ve adopted to overcome the challenges of distributed systems just disappear when you have fast, strongly consistent transactional writes on a single data store. Either a thing happened or it didn’t. No need to wait around and see.
This matters even more as applications move from single failover to multi-region to edge computing with hundreds of locations of compute. How do you get consistent state out to each PoP?
You don’t, you integrate through the database.
chrisjc 4 years ago

Since you're the one making the claim that they're making a come back, I'd love to hear your own personal story or some of the stories you've heard.
Personally I haven't noticed anything resembling a come back, but I'm certainly using them more than ever... and I'm loving it.

mdoms 4 years ago

I think anyone with experience on this type of system will recoil in horror at the thought of building it intentionally today.

cogman10 4 years ago

Tell me about it. This is the current system I maintain. While new products have (mostly) started using non-integrated DBs, all the legacy systems we have use a giant integrated db.
It SUCKS.
Let me count the ways.
- You can't update a table schema without updating and deploying a bunch of applications in sync (even if that's part of a stored proc). Which for something like this, means that getting out of this situation is WAY harder than getting into it.
- You end up putting WAY too much logic into the DB which makes it hard to ultimately figure out WHAT is supposed to happen
- DBs have TERRIBLE development stories. That's ultimately because the code and data all live in the same place and you can update code without any sort of revision control to help you understand or see a change that's been made to db schema (Forcing a bunch of painful process around updating DB capabilities).
- DBs are resource bottlenecks that SUCK to figure out how to scale out. Putting a bunch of apps into one DB complicates that process. Scaling a single app in a single DB is simply WAY easier.
- At least my db (but I assume a bunch of other DBs) have really crappy performance diagnostic tools. Further, the more complicated the queries against it, the more likely you are to go from "Hey, stats are making things fast" to "OMG, why is this thing taking 10 seconds to run now!". It's really bad when the only solution that seems to fix things is dumping stats.
I could MAYBE see something like this for a macro service dedicated to a domain, but I'd never build a complex system like this from scratch. Colocating apps in the same DB would have to be for some crazy performance reasons why bypassing a microservice makes sense. An exception, not the regular course of action (And I'd still hate it :) )
- evanweaverOP 4 years ago
  
  I mean, yeah, this is why people stopped using this pattern. But these problems are getting solved, especially in Fauna:
  1. Schemaless/document/schema-on-need databases like Fauna don't mandate the application breakage on every change that SQL does
  2. It's hard to reason about if its not transparent, but it can be transparent now, see below
  3. Fauna is a temporal database, which acts like version control on your stored procedures, so you can easily check and revert any change
  4. Fauna is serverless and horizontally scalable without consistency/latency impact
  5. This was definitely a problem when you were occupying precious CPU cores on a vertically scaled RDBMS with business logic, but compute in Fauna or in serverless lambdas scales horizontally indefinitely

DeathArrow 4 years ago

I wouldn't integrate through a database but I wouldn't refuse to integrate through a distributed cache.

Anyway using REST for inter services communication decreases performance and increases latency and 99% of projects still do it.

evanweaverOP 4 years ago

Curious what makes a cache better to you.
- DeathArrow 4 years ago
  
  It would be faster and I still get to keep microservices from accessing the data they don't need or shouldn't access.
  But I rather use RPC for communication.

snicker7 4 years ago

Putting more of your application logic in your database is great for DB vendors. Its pretty much impossible to migrate to a competitor.

newaccount74 4 years ago

Putting more of your application logic in the middleware is great for middleware vendors. It's pretty much impossible to migrate to a competitor.

orev 4 years ago

The contempt towards the “old ways” in this piece is really irksome, especially since those “old ways” are perfectly acceptable for 95% of applications (you only need this hyper scale stuff if you’re running a MAANG level app). The rest of us still doing things the “old way” are perfectly happy to have simple systems that run reliably, as we sit and watch the complete train wreck of complexity everyone else is building using these “new ways” that are supposedly better.

chrisjc 4 years ago

While I see your point, it really felt like they were trying to empathize with "new" developers that might feel that way about doing things the "old way"... and that they may want to reconsider those feelings now that there are new technologies and services to overcome the limitations that existed in the "olden times".
robsutter 4 years ago

I actually think this piece aligns with your thinking. Using a single, strongly consistent data store to eliminate the "train wreck of complexity" allows for a better developer experience while still delivering a responsive user experience. That matters at all kinds of scale, not just hyper.
iwintermute 4 years ago

I know it's off-topic but maybe "MAANG level app" >> "MANGA" if we want go there?
- pred_ 4 years ago
  
  But if F->M, then why not G->A? MAAAN.
  - jayd16 4 years ago
    
    Referring to the tech powers that be as "The MAAAN" is pretty fitting.
    
    BoxOfRain 4 years ago
    
    I like this, I suppose under this new acronym cutting Google and Facebook out of your life would be "sticking it to the MAAAN".
- 6gvONxR4sf7o 4 years ago
  
  If you're going to rename F to M, why not rename G to A?

Settings

Reducing complexity by integrating through the database

Keyboard Shortcuts