Your team has decided to migrate your monolithic application to a microservices architecture. Youāve modularized your business logic, containerized your codebase, allowed your developers to do polyglot programming, replaced function calls with API calls, built a Kubernetes environment, and fine-tuned your deployment strategy. But soon after hitting deploy, you start noticing problems. Services take a long time to start, failures cascade from one container to the next, and small changes involve redeploying the entire application. Werenāt microservices supposed to solve these problems?
If this sounds familiar, you might have fallen into a common microservice anti-pattern: the distributed monolith. In this article, weāll explain what a distributed monolith is, why you should avoid them, and how you can use Chaos Engineering to validate whether your application falls under this anti-pattern.
What is a distributed monolith, and why is it bad?
A distributed monolith is an application thatās deployed like a microservice but is built like a monolith. It leverages platforms like Kubernetes and a distributed systems architecture but isnāt designed to do so efficiently or reliably.
In a monolithic architecture, the entire application is bundled into a single package that includes source code, libraries, configurations, and all other dependencies required for it to run. Monoliths arenāt inherently bad, but they have limitations:
- Deployment artifacts are typically large, slow to start, and use a significant chunk of resources.
- Changing one part of the application means rebuilding and redeploying the entire application, slowing developer productivity.
- Scaling horizontally is inefficient, as monoliths arenāt inherently designed for distributed systems.
In comparison, microservices split an application into discrete units that have clearly defined service boundaries, are less resource-intensive, and are more readily scalable. These benefits come at the cost of deployment complexity, managerial overhead, and observability challenges. Besides managing our application, we now need to manage images and containers, orchestration tools, networking and security policies, distributed data, and more.
With a distributed monolith, we have the heaviness and inflexibility of monoliths, the complexity of microservices, and few of the benefits of either architecture. We still have slow deployments and poor scalability, but now weāve added operational complexity and removed isolation between services. Our engineers need to learn the new architecture, adopt new tooling, and rebuild their applications to tolerate the dynamic and transient world of containers, adding tons of time and labor.
How can you tell whether your microservice is a distributed monolith, and what does this mean for your development strategy and application reliability? Weāll list some common characteristics of this anti-pattern and show you how you can test for them by using Chaos Engineering.
Sign #1: Services are tightly coupled
Coupling refers to the degree of separability between two functions. In a monolith, different functions are tightly coupled due to sharing a common code base and running within the same process space. Tight coupling can take different forms, such as:
- Requiring a dependency to be available to complete a task (behavioral coupling).
- Requiring fast, low-latency communication with other services (temporal coupling).
- Having to change multiple services as a consequence of changing a single service (implementation coupling).
Think of an e-commerce application. When a customer views a product page, we need to:
- Query the database for information about the product.
- Process that information on the backend.
- Render it on the page.
Using a traditional monolithic application, we might create a model-view-controller (MVC) framework to logically separate our backend logic (retrieving product details) from our frontend logic (rendering the web page). These are two different functions developed by two different development teams, but theyāre part of the same monolithic codebase. We canāt deploy or modify one without doing the same to the other.
Now letās restructure this application as a microservice. Weāll create two services: one for the frontend and another for the product catalog. Weāll replace direct function calls with API calls and connect the services over the network. Everything looks and works great, but then our product catalog service crashes. What happens to the frontend? Does it continue working despite the failure, or are our services so tightly coupled that the frontend also fails? If our backend team deploys a fix to the product catalog, do we also need to deploy a fix to the frontend? Most importantly, what does the customer experience when this happens?
How do you test for tight coupling?
When testing for tight coupling, we need to determine whether changing the state of one service affects another service. To demonstrate this, weāll use an open-source microservice-based e-commerce website called Online Boutique. Online Boutique uses ten distinct services to provide features such as the frontend, payment processing, and shopping cart management. We want to validate whether the product catalog is decoupled from the frontend.
Weāll perform this test using Gremlin, an enterprise SaaS Chaos Engineering platform. Weāll create a chaos experiment, which is an intentional, proactive injection of harm into a working system to test its resilience with the goal of improvement. By deliberately causing failure in the product catalog service and observing the impact on the frontend, weāll know how tightly coupled or loosely coupled these services are.
Chaos experiments have four components: a hypothesis (what we think will happen during the experiment), an attack (the method of injecting harm), abort conditions (when to halt the test to avoid unintended harm), and a conclusion based on our hypothesis. Our hypothesis is this: if the product catalog is unavailable, the frontend displays an error message to the user. Weāll use a blackhole attack to drop all network traffic between the frontend and product catalog services. While the attack is running, weāll refresh the web page in a browser. Weāll abort the test if the website crashes or takes more than 60 seconds to load.
Letās run the attack in the Gremlin web app:
When we refresh the product page, we see that it gets stuck in a loading state. Looking at the source code shows a timeout period of 3 seconds, but our observations prove otherwise. We've just discovered an unintended temporal coupling and should focus on decoupling these services.
We can take quick steps to mitigate this by lowering the timeout threshold or by making our API calls asynchronous so that our frontend doesnāt have to wait on the product catalog before responding to the user. We can reduce the risk of tight coupling in other services by using domain-driven design (DDD) to set well-defined boundaries between microservices (known as bounded contexts). This step is best performed before starting a migration to microservices, but itās still beneficial even after development has started.
Sign #2: Services donāt scale easily
Scalability is one of the most important qualities of a microservice. Being able to scale quickly and efficiently lets us handle sudden changes in demand, prevent our systems from being overloaded, and keep our operating costs low. Vertical scaling works up to a certain point, but modern cloud-native applications must also be able to scale horizontally. Cloud platforms and orchestration tools have simplified scaling, but we still need to design our applications with scalability in mind. Distributed monoliths are technically scalable, but only in very inefficient ways.
Letās say that our website is seeing a surge in traffic, and we need to horizontally scale our frontend to handle the increased number of connections. With a monolith, we need to deploy new instances of the full application and all of its dependencies. Not only is this process complex and prone to failure, but it takes longer and uses resources less efficiently than with a true microservice.
When testing for a distributed monolith, we want to know how long it takes our application to scale, what the potential failure points are while scaling, and what impact this has on other services (specifically, dependencies).
How do you test for scalability?
For this test, weāll deploy WordPress, an open-source CMS, to a Kubernetes cluster using the Bitnami Helm chart. This chart deploys the WordPress application in one Pod, and its attached MariaDB database in another Pod. Weāll create auto-scaling rules to increase the number of WordPress Pods when CPU usage exceeds 50% utilization across the deployment:
1helm repo add bitnami https://charts.bitnami.com/bitnami2helm install wordpress bitnami/wordpress3kubectl autoscale deployment wordpress --cpu-percent=50 --min=1 --max=10
Note that this only scales the WordPress application, not MariaDB. This means that all of our newly-provisioned Pods will attach to the same data store. When testing scalability, we want to see:
- Whether our application can scale and load balance traffic reliably.
- How long it takes for the new Pod to get up and running.
- Whether MariaDB acts as a bottleneck.
Weāll do this by using a CPU attack to increase CPU usage on the deployment. Weāll set our target utilization to 60% while monitoring the state of the deployment using kubectl
.
1watch -n 1 'kubectl get hpa wordpress'23NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE4wordpress Deployment/wordpress <unknown>/50% 1 10 1 3m54s
Note how long it takes for the new Pods to deploy. Once traffic is load balanced to the new Pods, monitor the amount of load on your database. Is there a significant enough increase to where you need to scale up your database as well? If so, you may have a distributed monolith.
This is a useful experiment regardless of whether you think your application is a monolith as it can help you prepare for traffic spikes and maintain high rates of throughput during unexpected load.
Sign #3: Services are overly chatty
Sharing data between distributed systems is challenging due to the added latency of network calls. In a monolith, data can flow between functions nearly instantaneously. But in a microservice, extensive communication between services will significantly reduce the applicationās throughput, possibly causing timeouts and other unexpected failures.
For example, WordPress uses a MySQL (or MariaDB) database to store data such as user accounts, page contents, configuration settings, and much more. A single page load can generate multiple MySQL queries. As such, WordPress works best with a low-latency (ideally local) database connection, but this creates a temporal dependency between the two services.
In Kubernetes, one option is to deploy WordPress and MySQL together in the same Pod. But this creates a Pod that serves multiple functions, requires significant system resources, has a longer startup time, requires data replication and synchronization, and is owned by both the application and database team. In other words, it's a monolith packaged as a microservice.
Networking two services might not seem like a problem until our network becomes saturated or degraded. With Chaos Engineering, we can see the impact that this might have on our application.
How do you test for chatty services?
Imagine that a network error is adding 20 ms of delay to all network traffic in our environment. This doesnāt seem like much, and a well-architected microservice should be able to tolerate this with only a minor increase in response time.
Using Gremlin, weāll use a latency attack to add 20 ms of latency between our WordPress Pod and its MySQL Pod. Weāll use Apache Bench to benchmark response times before and during the attack.
First, letās get our baseline metrics before running the attack. We see that each request takes roughly 128 ms to complete:
1$ ab -t 30 http://kubernetes-cluster:30003/2Requests per second: 7.78 [#/sec] (mean)3Time per request: 128.549 [ms] (mean)4Time per request: 128.549 [ms] (mean, across all concurrent requests)5Transfer rate: 197.97 [Kbytes/sec] received
Now, letās run the attack. Once the attack enters the Running state, weāll re-run the benchmark:
1$ ab -t 30 http://kubernetes-cluster:30003/2Requests per second: 1.39 [#/sec] (mean)3Time per request: 718.532 [ms] (mean)4Time per request: 718.532 [ms] (mean, across all concurrent requests)5Transfer rate: 35.42 [Kbytes/sec] received
Just 20 ms of additional latency caused a 5.5x increase in response time, an 82% decrease in transfer rate, and a noticeable delay for users. Almost anything could cause a similar outcome in the real world, from network saturation to a faulty switch to Kubernetes scheduling the MySQL Pod on another node. Imagine if we had a 40 ms delay, or a 100 ms delay!
To mitigate the effect of latency, reduce the number of round-trip network calls between services. Consider making calls asynchronous, or using a message queue like Apache Kafka to decouple these services. As with tight coupling, this is another good example of why you should use domain-driven design to split your services and data into different domains.
Conclusion
Moving to microservices involves more than just repackaging monolithic applications into containers. There are fundamental differences in architecture that affect everything from how we transfer data to how we recover from failures. Failing to account for these differences can lead to restricted scalability, diminished performance, and unexpected outages.
To determine whether your application is a distributed monolith, sign up for a Gremlin Free account and start running experiments on your applications within minutes. Use the CPU attack to test your applicationās ability to auto-scale, or use a shutdown attack to test for tight coupling between services.