DevNexus Day 2: Metrics, Monolith Decomposition

Together with four “AwesomeSauce” colleagues from Info Support, I’m attending DevNexus this year. For me, it’s the second time I’m here, as I spoke here in 2018, too. Next to delivering my own “React in 50 minutes” session I’m attending some sessions to update with new technology advancements. After a great first day, let’s move on to the second (and last) day.

Metrics for the Win

Erin Schnabel started of her talk with a short history of how infrastructure and deployments rapidly got more flexible over the last couple of years. Our systems basically become assembled from boxes (containers), orchestrated by tools like Kubernetes. But an old law still applies:

If you want your system to be sane, it must be observale.

So even orchestrated boxes require observability. If only because you want to understand reasonably well what happened without replaying that problem.

Metrics are much broader than just measuring throughput, http response times, everything that happens in your application. In fact, metrics can serve you by indicating whether the components of your application do their one thing and do them well. Whether it is microservices or plain old service-oriented architectures, in both cases collecting meaningful metrics about what your application does can help you spot problems at an early stage.

But pay attention: every metric becomes a different time series. This means you must pay attention to what metrics you measure. If there’s any part in the name of a metric that is not a bounded set, you will sooner or later explode your storage.

Types of Metrics

There are different types of metrics:

  • counters for incrementing values
  • gauges for observed values
  • histogram for a distribution of values in buckets
  • summary which basically is a combination of buckets over time

Code

The example code for the talk is at GitHub. There’s one class that manages all the Prometheus stuff. It processes events fired by the application and updates the relevant metrics for that using an PrometheusMeterRegistry. It also provides a method to scrape all the collected metrics. Then there is a /prometheus endpoint that exposes the metrics for use in Grafana.

Decompose your Monolith

Next up in my schedule was Chris Richardson talking about decomposing a monolith. All the typical -ilities of quality tend to decline over time when building monoliths. This is exactly the opposite of what you would like to see, as you’d rather see them increase over time. Microservice architectures provide a better mechanism to keep these -ilities healthier since the size of components stays relatively small.

That’s all nice and good, but many organisations have large code bases combined with slow delivery processes. Is there a way to move to microservices? Ask yourself first whether you can improve your current situation. Monoliths aren’t an anti-pattern in themselves, and many of the organisational problems can be solved without refactoring the architecture. If and only if that is insufficiently improving your agility and delivery speed, consider migrating to microservices.

A popular pattern for doing this is the Strangler Pattern. In this approach, you put a new application (or microservices in this case) next to the monolith. New features preferably go into the new microservices. Over time, you also start migrating features from the monolith to the microservices.

But how do you know whether your approach is going the right direction? Of course, it’s not about the number of services. In stead, there’s other, more useful metrics:

  1. lead time — the time it takes to bring a change in the codebase to production .
  2. increased deployment frequency — how often do you actually deploy new releases to production.
  3. changed failure rate — how often does a change lead to a failure.

Keep in mind that this process can actually take years. You keep on doing this until your -ilities are at the level where you want to have them.

Getting Started

First, take time to define what the microservice architecture should look like. This is not to set it in stone but, to rather have a direction in which you can start moving. Then start working on those pieces where you expect the highest return on investment. On the benefits side we find increased velocity, solving a problem and increased scalability. On the cost sidewe find the cost of changing, adapting, rewriting, but also breaking dependencies and coupling, and having the new services participate in distributed transactions or sagas. In a monolith, transactions are relatively simple, but in a distributed world, you will probably need “compensating transactions”. Those compensating transactions undo the effects of an earlier transaction that succeded in one microservice, but failed on another.

When doing this, take the “Law of Holes” into account:

If you find yourself in a hole, stop digging.

This means that you should avoid increasing the size of the monolith since it will probably further decreases the -ilities. As a consequence, you might need some “data integration glue”, code that allows a pristine new service to access that data lives in the monolithic database. Alternatively, database triggers or application events may be effective ways to synchronise data between the monolith and the new pristine service. It is important to think about this: you cannot simply refactor the code away, you will need to migrate the data used by that code as well. Whether it’s tables or columns of tables, you need to address this.

Remotable Interface

Defining a Remotable Interface as the primary way how the monolith interacts with soon-to-be refactored code. That interface will eventually become the foundation of the public service interface. Then have all write operations on the relevant data go through that Remotable Interface. As you move code away from the monolith to the microservice, you can replace the implementation of that Remotable Interface with invocations of the new microservice. Then again, keep on doing thus until the success metrics are at the desired level.

Summarising the process:

  1. Split the code — Define a (potentially bi-directional) API between the monolith and the soon-to-be service. Its implementation is the existing code, moved to a new module in the same Maven or Gradle project. Database access keeps on using the existing database.
  2. Split the database — In this stage, you define the new database schema and optimise it for the new service. Changes to both the old and the new database must be replicated to the other database. The database could actually use the same database server, which makes the replication easier to implement with triggers.
  3. Define and deploy the new service — Deploy the new module as a stand-alone service in your deployment architecture. You can test it and verify it works as expected, but it’s not getting any production traffic yet.
  4. Use the new service — Start routing traffic to the new service. Some clients may connect directly to the new service, while others may connect to the monolith, and some others may do both. Interaction between the monolith and the new service can be done using asynchronous messaging.
  5. Remove old code — Finally it’s time to remove code from the monolith. You have now completed one iteration of the process and are ready for the next one!

To Batch or Not To Batch