Attacking your monolithic database with a swarm—a whole swarm!

Loosely coupled microservices and swarms of databases are the new blob-busting hotness.

As culture constantly shows us, monoliths are a tricky thing to deal with. Credit: Metro-Goldwyn-Mayer (MGM)

Imagine one huge, monolithic relational database—say, a MySQL or Oracle installation—squatting in the middle of an organization’s business like Jabba the Hut. The big blob is kind of comforting. Its massive gut keeps all the data all in one place, making it an attractive integration platform.

The problem is that the blob only speaks the language of structured data: SQL. Integrating it with non-relational and unstructured data can be an adventure. And because of its size and structure, adapting it to new tasks can be slow going at best. Unfortunately, nowadays, such blobs must move.

The world of apps is in constant flux and, with it, so are the demands on data. APIs are constantly changing to meet those demands (a social media connection here, a new mobile platform there). But throughout all this, core business can’t be bogged down; it has to move fast. And that’s where microservices—the dissection of the data monolith into agile little services—come in.

Slicing and dicing

Leena Joshi, Redis Labs’ VP of product marketing, gives this example. In a typical online business model, a site has a storefront, modules for orders, a catalog, and a module for product reviews. A customer looks at the catalog and places an order, but eventually the business wants to put in a recommendations service: “Like that? People who liked that also liked this!” or “Based on your past orders, we’re recommending these 20 other things.” Maybe there’s even a desire to add a “share what you purchased on social media” service.

As you start loading those services up, that monolithic database architecture gets fatter and slower. With every individual service tied to the notion that it’s going to the same database and that there has to be 100 percent consistency across all the data across all those services, each service’s response rate slows down to that of a bug stuck in tree sap.

Monoliths are created, as ThoughtWorks chief scientist and microservices expert Martin Fowler has written, when entire applications are built as a single unit. This often occurs in three parts: a client-side user interface consisting of HTML pages and JavaScript running in a browser on the user’s machine; a database consisting of many tables inserted into a common (and usually relational) database management system; and a server-side application. The server-side application handles HTTP requests, executes domain logic, retrieves and updates data from the database, and selects and populates HTML views to be sent to the browser. It’s one, single, logical executable. And as such, if you need to change the system, you have to build and deploy a new version of the server-side application.

“It’s much more scalable having microservices in because you’re not storing everything in a classic relational database.”

As additional features and services get tied into a monolithic application, things begin to get complicated. “When you try to integrate the system through shared, consistent data, you run into consistency problems,” Fowler says. “Everybody needs slightly different data organized in slightly different ways. It becomes more and more difficult, and more of a bottleneck, with every system more and more tightly coupled to another system.”

In order to overcome those challenges, businesses carve up the monolith—a very involved process that adds even more complexity.

The micro-thinking approach means snipping dependencies and giving each microservice its own database and its own autonomy. Fowler points to Amazon as an example. The company started with a monolithic app, a page-rendering engine dubbed Obidos. By 2001, Amazon finally gave up on scaling the back-end databases to hold ever more items, customers, orders, and multiple international sites. Instead, the company moved to a service-oriented architecture before SOA was even a thing.

To Amazon, moving to SOA meant encapsulating the data with the business logic that operates on the data. The only access had to come through a published service interface. As Amazon CTO and VP Werner Vogels told the Association for Computer Machinery, that meant, yes, everything needed its own database. “No direct database access is allowed from outside the service, and there’s no data sharing among the services.”

Chris Richardson, creator of the original CloudFoundry.com and the man behind the microservices application platform Eventuate.io, says that it’s more precise to say that within a microservice, its data should be private to the microservice. “Sharing data entails coupling,” he says. “Imagine you have a service that stores orders and one that stores customers. One may need the other’s data to define what a customer’s credit limit is, and the other needs it to calculate credit. That introduces coupling. For me, as a developer, I’m no longer free to change that service’s schema, because I wouldn’t be free to change it arbitrarily. I’d have to talk to the other team and plan that change carefully.”

The motivation behind microservices is to break up the development organization into a bunch of teams, each working on its own without having to wait on input from the other teams. As it is, communication flows well inside a team, and that enables a team to work fast. But as soon as you have to communicate across team boundaries, development efforts slow down. You may want to make a database change today, but the customer service team is tied up for a few weeks implementing something, so you have to wait. The solution: separate teams, separate databases, no waiting around.

All those little databases

Once you get used to the idea of each microservice having its own data, Richardson says that it readily leads to a new idea. You can pick the most appropriate database technology for each microservice based on what it needs. “For instance, if a microservice needs to search queries, it can use Elasticsearch,” he says. “If it’s a high-performance data store, it can use Redis. If it’s a graph search, Neo4j.”

Fowler says there has been some interesting synergy going on in the database landscape vis-a-vis nontraditional, often so-called NoSQL databases. First, he says, there’s the question of scale. Amazon wasn’t the only company that found itself bursting at the seams and needing to scale. The changing database landscape led to aggregated databases—which simplify application code and limit resource requirements by processing data records and returning computed results—such as MongoDB, Basho’s Riak, Cassandra, and the like. Complexity led to more data models with graph databases such as Neo4j. These changes brought along the sense that you no longer had to fit given data into a predetermined database and, typically, into a predetermined database vendor.

Microservices play well in that evolution since the architecture decentralizes the question of what kind of data you store. “If you’re dealing with the question of large scale, you go down the Cassandra route,” Fowler says. “If you’ve got a complex dynamic schema of joining things together, a graph database fits that. Or if it’s a document database without challenges in replication, you go with a standard RDBMS [relational database management system].”

Basho CTO Dave McCrory says that microservices are bringing flexibility to databases. “It’s much more scalable having microservices in this way, because you’re not storing everything in a classic relational database,” he says. “Let’s say you have 20 services, and one is very demanding.

“You can point that specific database to a separate database, or cluster. It might be a very busy, very heavily loaded service. You can scale out and give it its own storage and all these things. Meanwhile, other services could simply live on a separate database shared amongst all of them. Later, if another service became data-intense, you could break it out to its own data instance. You get a lot of flexibility in doing this that you wouldn’t get otherwise.”

The benefits of microservices are considerable. microservices are easier to deploy, autonomous, and less likely to cause system failures when they go wrong, as Fowler has written. You can mix multiple languages, development frameworks, and data-storage technologies.

But this landscape has as many thorns as roses. Microservices, like all technologies, have tradeoffs.

Yes, Brett Michaels once used a similar metaphor (but stay with us). Credit: Ian Waldie for Getty Images

Microservices’ macro-tradeoffs

One of the problems with microservices is their distributed nature. Distributed systems are tougher to program since remote calls are slow and are always at risk of failure. “If your service calls half a dozen remote services, each which calls another half a dozen remote services, these response times add up to some horrible latency characteristics,” Fowler notes.

There are workarounds, such as making fewer, more granular services. But that makes for a more complicated programming model as you figure out how to batch those inter-service interactions. “It will also only get you so far,” Fowler says. “You are going to have to call each collaborating service at least once.”

For his part, Richardson explains that distributed transactions are best avoided entirely because of the CAP theorem, which holds that you can only have two out of three when it comes to consistency, availability, and partition tolerance.

Moreover, many nontraditional (NoSQL) databases don’t even support distributed transactions. Richardson has been working on a solution for that. His startup, Eventuate.io, is building a platform that makes it easy to build apps using an eventually consistent, event-driven architecture, otherwise known as event sourcing. The idea there is to store an entity in the database as a sequence of state-changing events.

For an order, for example, rather than listing it as a row in a database table, it could be stored as an event. The order was created, the order was shipped, and so on. You store the event, and when you need to restore the state, you reload it from the database. Services publish events when they update data, and other services subscribe to events and update their data in response.

Another microservices trade-off is operational complexity. A handful of applications turned into a swarm of hundreds of little microservices can be too much for some organizations to handle. All those little services have to be managed and monitored, and there’s no way to do it without a whole lot of automation, Fowler says.

And then there’s the issue of service discovery. Organizations can have hundreds of microservices. Within one business, after you decompose a monolith, you could have 50 services handling payments. How does a client locate the service it needs? Lewis says this is actually two problems. The first is technical: you know that in your organization there’s a thing to do customer payment that you need to use. How do you find it? You use a technical solution, such as DNS lookup. You can install a service discovery console, and it will sort things out, he says.

But on the other side, there’s the issue of having multiple redundant services in the organization. “That’s a question of organizational design,” Lewis says. “Is it OK to have duplicate services? Many different user services? Many paths of truth to data? How do I, as someone interested in the design of organizations, allow different parts to evolve independently? An insurance example I use a lot is, I’m offering home, life, and motor insurance. Should they all have a service that does lookup for, say, risk? Or should there be a centralized, single version of that? That’s about how an organization is designed and not about technical architecture. Whichever one you choose, you can find a technical solution for it. It’s more about how you organize team structures and organizational boundaries.”

In other words, it boils down to Conway’s Law. Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.

Matthew Skelton, the co-founder and principal consultant at Skelton Thatcher Consulting, offers up a simple solution for discovery: “Just get teams to talk to each other.” He suggests using something simple like a Wiki page. “You might be able to document service endpoints, but can you accurately describe the intent of a service in a nuanced way? Quite possibly not with a service catalog. Actually, the intent of a service might need human interaction. That seems like news to lots of teams.”

How do you split up a database?

Beyond all this talk of tradeoffs, the hardest bit about splitting up a monolith is the database, Skelton says. “If in the past an organization has been able to rely on a single RDBMS to split it while keeping data consistency across an organization or software systems, we need to bring in new technologies around publish and subscribe, event-driven, [etc.]. They’re new [technologies], and they can be seen as challenging or awkward depending on who’s doing it.”

The classic way to approach carving up the database is by domain segment: things that map to a particular part of the business. That’s probably the best place to start, Skelton says, but that’s not the only thing to consider. Another is regulatory concerns, such as the regulatory pressure you find causing the need for change in financial or healthcare industries.

There are things that can help guide a safe splitting up of a database into smaller chunks. They include well-tested, well-known processes such as one created by Scott J. Ambler and Pramod J. Sadalage to evolve a database over time with multiple clients connecting. This process even does so in a way that doesn’t break the database.

But the first thing to do, Skelton says, is to put modern logging and metrics into the existing system in order to get a proper understanding of what’s going on. Then, as services get pulled out, the tools are already in place to give you an idea of what’s happening. “It gives us some data to tell us what’s the right size of thing to split up,” he says. “The goal isn’t to split into the smallest chunks. The goal is to get the right size for a team, sized for the right kind of chunk, so we can deploy it effectively based on [regulatory or business] pressures. If we have the right kind of metrics and logging, [we can have] the confidence that we can do a split, at the right size, and we’re getting a rich set of information back, as we’re making it more and more independent. There’s information coming back all the time as to how applications are performing and if data is still consistent.”

A lot of teams seem to miss how essential logging and metrics are, Skelton says. In the 1990s, logging and metrics were seen as optional. He believes that’s not possible anymore with microservices. You have to have a high degree of logging and metrics. “Otherwise, we haven’t got any hope of building them and running them effectively,” he says.

Microservices aren’t for everybody; they’re no silver bullet. In order for them to be worth it, Fowler says, you must have reached the point where you’ve lost the ability to handle that Jabba the Hut data architecture. If a data monolith is just chugging away in the middle of the business without much reason to scale up, just let that blobby monolith be.

Lisa Vaas is a freelance technology journalist and blogger based in Boston who writes about database technology, information security, careers, and the applicant tracking systems where resumes go to die.

Listing image: Metro-Goldwyn-Mayer (MGM)

80 Comments