Deploying Microservices on Kubernetes is not as easy as 1–2–3. Many of them suffer from frequent OutOfMemory kills and long startup times. In particular, Java applications with their resource-hungry JVM need special attention.
In this article we’ll analyse the CPU and memory resource consumption of a Spring Boot microservice during startup and runtime. Then we’ll find out how to tweak resource requests and limits to tune startup and prevent OOM kills.
And last but not least the curious among you will get an explanation of the mysterious 137 and 142 exit codes ;)
But first let’s start with some dry Theory
Before you begin to configure your pod you should know that Kubernetes will assign it a Quality of Service class, depending on its resource requests and limits. And there are three of them: Guaranteed, Burstable and Best Effort. When things start to go wrong in your cluster because of insufficient resources, Kubernetes will look at the Quality of Service classes of pods to decide which could be evicted first to free resources.
Note: You can see the class of your pods e.g. with the “describe pod” command of kubectl.
The decision which Quality of Service class is assigned to a pod is made by its resource requests and resource limits.
To gain the very best classification — Guaranteed — your pod (or exactly each of its containers) needs to meet the following requirements:
- CPU request and limit must be given and be the same
- Memory request and limit must be given and be the same
Note: if there is no request but only a limit for a resource, then Kubernetes will set a request equal to the limit!
A pod that becomes classified Burstable does not meet the requirements for Guaranteed but has at least one resource request on one of its containers.
Pods that neither have a request nor a limit are classified Best Effort.
The following contracts apply to each class:
Guaranteed: will only be killed when it exceeds its memory limit
Burstable: might get killed when node runs out of resources and exceeds requests
Best Effort: first to be killed when node runs out of resources
You see that it is important to make well-considered settings for a pod’s resource requests and limits to make sure that Kubernetes does not overprovision a node with pods and that your pods won’t get evicted due to resource overuse.
Now let’s get hands-on
For our analysis I’ve taken a very nice Spring Boot microservice demo from here: https://github.com/springframeworkguru/springboot_swagger_example
I’ve built the artifact with “mvn package” and created a Docker image with the following Dockerfile:
FROM openjdk:alpine
COPY spring-boot-web-0.0.1-SNAPSHOT.jar /
ENTRYPOINT ["java", "-jar", "/spring-boot-web-0.0.1-SNAPSHOT.jar"]Knowing nothing about the resource usage of the microservice, we start googling some examples and reach that page: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
Here we find out that a MySQL database gets half a CPU and 128 MB RAM. So we decide to give our microservice the same CPU and a little bit more RAM:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "500m"Note that we want to reach Guaranteed Quality of service class so we set requests equal to limits.
CPU utilization
When we now launch a pod with that microservice, we recognize that it takes a very looooong time to come up — about 60 seconds. That’s really bad. Imagine a production environment where you have scaled your microservice with a deployment to have 10 replicas and you want to deploy a new version with a rolling update. It will take more than 10 minutes to complete! Now imagine that you encounter a problem and need to rollback. Another 10 minutes pass by. You’ll make your DevOps tear their hair.
But what’s the problem now? The microservice demo is really not that complex, it has an in-momory database and no dependencies.
To figure it out we make some experiments with Docker. Let’s start a docker container of your demo app on our local machine, limiting the CPU to 0.5:
docker run --cpus 0.5 --rm spring-boot-test:3The result is the same, a startup time of more than 80 seconds:
2019–03–18 20:24:31.182 INFO 1 — — [ main] g.s.SpringBootWebApplication : Started SpringBootWebApplication in 82.214 seconds (JVM running for 86.134)
Now we start the container without CPU limit and monitor in a separate terminal window its CPU usage with
docker statsFrom the output of docker stats you can see that the Spring Boot app is using more than 3 CPUs during startup!
And here is the startup duration — only 12 seconds:
2019–03–18 20:33:45.682 INFO 1 — — [ main] g.s.SpringBootWebApplication : Started SpringBootWebApplication in 11.919 seconds (JVM running for 12.725)
Here is a table that summarizes startup duration for different CPU limits (in millicores):
- 500m — 80 seconds
- 1000m — 35 seconds
- 1500m — 22 seconds
- 2500m — 17 seconds
- 3000m — 12 seconds
We can see that CPU limit directly affects startup times.
Okay, no problem you say. Just adjust the pod’s CPU resource request and limit to 3 CPUs (3000m). But this approach has two drawbacks. First, your microservice will likely never use that much of CPU power during its normal workload (otherwise you should consider a redesign). But second and much more important, your cluster nodes will be fully occupied with much less pods. E.g. on a node with 8 CPU cores, Kubernetes would only be able to schedule 2 of your pods.
But what else could we do? First, we should get a rough idea of what is the normal CPU usage of the pod under real conditions. A good tool for that purpose is Prometheus (and Grafana). However you could also try to guess and put a small extra on top.
Next we set that value as our CPU request and leave the CPU limit away! And that’s a really good decision, although you will lose the Guaranteed class.
Get Stephan Hartmann’s stories in your inbox
Join Medium for free to get updates from this writer.
Let’s say we configure a CPU request of half a CPU core (500m) for all our microservices. This means that on our 8-CPU-node, Kubernetes could schedule 16 pods. The good thing is, that not all pods might use that CPU power all the time. For example, without any traffic, our demo app uses not even a tenth of a CPU core. So while some of your pods are idling, others could burst their request and use more CPU power (e.g. during startup). Kubernetes won’t kill a pod only if it uses more CPU than requested.
Excursion on compressible and non-compressible resources
CPU is considered a “compressible” resource while memory is “non-compressible”.
Compressible means that pods can work with less of the resource although they would like to use more of it. For example, if you deploy a pod with a request of 1 CPU and no limit, it can use more than that if available. But when other pods on the same node get busy, they will have to share the available CPUs and might get throttled back to their request. However, they won’t be evicted and can still do their job.
For memory on the other hand, when a pod has a resource request for memory but no limit, it also might use more RAM than requested. However, when resources get low, it can’t be throttled back to use only the requested amount of memory and free up the rest. There is a possiblity that Kubernetes will evict such pods. Therefor it is crucial to always set a memory resource limit and take care that your microservice will never exceed that limit.
Memory utilization
For RAM our considerations need to be a little different. As noted above, RAM is a non-compressible resource. It is crucial to determine how much memory your microservice will use. Then you should set your requests and limits to that value.
To make sure that your Java microservice won’t use more RAM than what you have configured, you also need to set some JVM settings accordingly.
By default, the JVM will take up to a quarter of the physically available RAM for its heap (https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html). The problem is that Java 8 will look at the nodes complete memory to compute that value and as a consequence, eventually it will use more RAM than allowed by the pod’s limit.
A solution might be to use the JVM switch
-XX:+UseCGroupMemoryLimitForHeap
However, this will result in a waste of RAM — remember that then it will take only a quarter of what you have configured as limit.
In the end it is best practice to determine the needed heap space and the total RAM consumption by monitoring your microservice for some time. Then set the heap with -Xmx and the total RAM with -XX:MaxRAM.
More background information about Java 8 in a container: https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/
Readiness and Livenss probes
For a detailed explanation, see https://medium.com/faun/understanding-how-kubernetes-readiness-and-liveness-probes-do-correlate-or-better-how-not-81d0ad15fd39
Trouble shooting pod restarts
Trouble shooting your microservice can be a hassle. If you see that one of your pods has frequent restarts, you will likely use kubectl describe pod to find out what caused the restart.
If the reason is OOMKilled then it is quite clear. Your pod has used more RAM than its limit and was evicted. Tune your JVM settings or the pod’s memory limit.
However, there are some more subtile causes that might be not so clear, especially when you see an exit code of 137 or 143 which occurs quite often for a Java microservice.
Excursion: Java exit codes
A Java application can terminate explicitely and return a concrete exit code by calling System.exit(n) where n is the exit code.
However, it can also terminate on receiving external signals like SIGTERM or SIGKILL. In that case, the JVM will exit with a code calculated by
EXIT-CODE = 128 + SIGNAL-CODE
The code for SIGKILL is 9 whereas the code for SIGTERM is 15.
So when your microservice has exited with code 137, you know now that it received a kill signal (SIGKILL). With code 143 it received a term signal (SIGTERM).
The difference is that your Java application can intercept a SIGTERM with a shutdown hook and perform some cleanup while it cannot intercept a SIGKILL.
Kubernetes will always first send a SIGTERM to give your pod the chance to shutdown gracefully. Then if your pod is still running it will send a SIGKILL.
Process of termination:
- Run preStop hooks
- send SIGTERM to containers
- wait for a grace period of by default 30 seconds to give the containers time for cleanup
- Still running after grace period => send SIGKILL to containers
The most common unplanned reasons why kubernetes would terminate a pod are:
- node runs out of resources
- liveness probe failed (is the initialDelaySeconds value of the probe lower than the startup duration? Maybe because of a CPU limit? Hint: don’t use t-instance-types on AWS in production!)
For completeness, there are also some common reasons for planned terminations:
- rolling update of a deployment
- draining a node
Conclusions and Recommendations
- Use CPU requests, but no limit
- Use memory limit equal to memory request
- Set JVM Heap, e.g. 80 percent of the memory limit
- Set JVM MaxRAM near to memory limit
- Implement readiness and liveness probes, ideally not just a simple health endpoint but one for each purpose
- Implement a shutdown hook
Links
Java signals, shutdown hooks and exit codes:
http://journal.thobe.org/2013/02/jvms-and-kill-signals.html
openjdk in a container:
https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/
Java memory calculation:
https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
Kubernetes: termination of pods:
https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
Kubernetes: readiness and liveness probes:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Kubernetes: Quality of Service:
https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/
Kubernetes: managing compute resources (limits and requests):
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
Spring Boot example app:
https://github.com/springframeworkguru/springboot_swagger_example