Dynamic alert routing with Prometheus and Alertmanager

TL;DR Dynamically route alerts to relevant Slack team channels by labelling Kubernetes resources with team and extracting team label within alert rules.

Press enter or click to view image in full size

Serious Prometheus-fu ahead!

At loveholidays, the “you build it, you run it” ethos is deeply ingrained into our engineering culture. Teams are expected to operate their systems with scalability, high-availability and reliability in mind, with little red-tape in the way of their productivity. Our teams love this freedom of technological choice, however, it is really inefficient to reinvent common building blocks like observability and alerting. This is where our Platform Infrastructure team steps in. We solve problems common to all tech teams like CI/CD, Infrastructure as Code (learn how we enforce best-practices through self-service Terraform), Observability (high-throughput logging with Loki) and Security. Some areas require joint effort to achieve the best overall outcome. Alerting based on Prometheus rules is one of them.

Monitoring at loveholidays

The majority of our services are running on Kubernetes (we use Google Kubernetes Engine). Many teams are running their own Kubernetes clusters, however, shared clusters are used to run customer facing services. The Platform Infrastructure team is responsible for the operation of our monitoring stack in shared clusters. Core building blocks of our monitoring stack are Prometheus (with a myriad of various exporters), Thanos, Grafana, Loki, Tempo, Promtail and Alertmanager. With Prometheus alone we capture and store over a million time series across 100 different sources for nearly 3 years which now amounts to 10TB of data.

Prometheus and Alertmanager

Our Prometheus comes bundled with good default alerting rules that are proven to improve operational resilience of production grade services. Teams are also encouraged to author and deploy alerting rules of their own as they have the most context on how good looks like for their own applications.

Below we create a generic alert that fires when Horizontal Pod Autoscaler is running at near-maximum capacity for at least 15 minutes. This is a problem because we can run out of autoscaling and fail to serve surges in traffic. Alert will trigger on violation in any namespace and label it as severity: warning. However, such an alert does not have a clear owner and will be sent into a general alerting channel. How can we directly route this alert to the team responsible for the misbehaving Horizontal Pod Autoscaler (HPA) instead of sending it first to all?

One possible solution for routing alerts to teams is to assign a team label to the alert. In the below example we have assigned the alert to the Platform-Infrastructure team.

There are number of problems with this approach:

Each team needs to write a slightly different version of the alert rules to support team specific routing. This creates a lot of code duplication.
Need for complex allow/deny lists of parameters on alerts like {hpa!~"app1|app2|app3"} This is the very definition of toil. Each new application will require rework of multiple alert rules. Teams will forget to make these changes and alerts will end-up in the wrong places, unnoticed.

We have also tried with limited success to route all alerts from a particular namespace to a particular team. Teams are dynamic and namespaces can outlive teams breaking this heuristic. We needed a more automated approach to generating alerts.

Dynamically route alerts to teams without duplication of alert rules

What does good look like with alerting?

Alert rules are not duplicated.
Alerts are routed to owner teams automatically.
No need to reconfigure or recreate alert rules when teams onboard new applications or pass the ownership of an application onto a newly formed team.
Alert rules remain easy to author.

How can this be implemented?

In order to route an alert to a particular team we need a way to associate an alert with a team. Taking our HorizontalPodAutoscalerMaximumCapacity alert rule as an example we are getting the following time series when evaluating below rule

kube_hpa_status_current_replicas / kube_hpa_spec_max_replicas * 100 >= 100

in Prometheus:

{hpa="booking-store",instance="x.x.x.x:x",job="kube-state-metrics",namespace="bookings"}

It indicates that the HPA named booking-store in bookings namespace is near-maximum capacity. Original query gives us no reliable metadata to associate time-series with `bookings` team. Hardcoding HPA name or namespace is brittle and leads to duplication.

By using Kube State Metrics as data source, we can augment the original time series with labels metadata from kube_hpa_labels.

kube_hpa_labels is the most interesting one to us.

Query kube_hpa_labels{hpa="booking-store"} returns the following time series:

kube_hpa_labels{hpa="booking-store",instance="x.x.x.x:x",job="kube-state-metrics",label_team="booking",namespace="booking"}

This is looking promising. We now know that there is a time-series available with a team label that we can match via namespace and hpaattributes.

Configuring team labels on the Kubernetes resources

You are hiding something from me!

I don’t have a label_team when I query kube_hpa_labels !

You are right. For kube_hpa_labels to have values, you need to set labels on HPAs. Same with kube_cronjob_labels, kube_pod_labels and many other kube_*_labels metrics.

With the team label set, time series will start including label_team="bookings". Notice that we are setting team label while Prometheus automatically adds label_ prefix.

Get Dmitri Lerko’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Doesn’t this just move complexity from alert rules to Kubernetes manifests?

Not really. Organisationally, we want teams to apply uniform labels to their Kubernetes resources for the reason of accountability, security, FinOps and Observability. The effort of adding a label to a resource is really minor is the responsibility of the team creating the resource.and falls onto teams. Enforcing uniformity of labelling across all teams and resources is hard , however the Platform Infrastructure team has solved this with GitOps, Conftest and Rego — but this is a topic for another time.

OK, we’ve set team labels, what’s next?

Brilliant!

Now we need to propagate label_team from kube_hpa_labels to our alert rule.

We are now able to consume label_team in our time series via kube_hpa_labels, so now we need to configure alerts to dynamically route based on the team label.

Without label_team propagation, our Alert rule query looks like this:

(kube_hpa_status_current_replicas /  kube_hpa_spec_max_replicas) * 100 >= 100

With label_team propagation it looks like this:

kube_hpa_labels * on (hpa, namespace) group_right(label_team) (
      (kube_hpa_status_current_replicas /  kube_hpa_spec_max_replicas) * 100 >= 100
)

It seems like a lot was added. Are we in breach of the “Alert rules remain easy to author” criteria?

Let’s break it down.

We did not modify the core of the original query.
We have added a secret sauce as line #1.
We’ve closed parenthesis with ) on the last line.

Alert rule query secret sauce

We take a time-series with desired label kube_hpa_labels
We multiply kube_hpa_labels with results of the original query. kube_*_labels time-series are all returning values of 1. Therefore multiplications do not change values and are only means to derive new label sets.
We are using on (hpa, namespace) clause to indicate labels matching kube_hpa_labels and our original query. Warning: do not cut corners here by not specifying namespace; one day you will have two identically named HPAs in different namespaces and it will break your alerting rule. Behind the scenes Prometheus will ensure that multiplication is only done in cases where kube_hpa_label‘s values for hpa and namespace are identical to the original query’s values.
group_right(label_team) uses group_right operator to take a label label_team available within kube_hpa_label and make it available to the entire query evaluation.

Combined together, our query now returns

{hpa="booking-store",instance="x.x.x.x:x",job="kube-state-metrics",label_team="booking",namespace="booking"}

instead of

{hpa="booking-store",instance="x.x.x.x:x",job="kube-state-metrics",namespace="booking"}

That’s a lot of words to include label_team="booking". What do we do with it?

Sending notifications dynamically

We can now complete the alerting rule by adding labels to the rule itself.

Notice that we are now enriching our alerts with a team label that gets its value dynamically from label_team which we’ve derived earlier.

labels:
  severity: warning
  team: "{{ $labels.label_team }}"

Now the team label on the HPA will be included in the alert.

Alertmanager’s role

Alertmanager is the final piece of the puzzle. It receives labelled alerts from Prometheus, routes them to pre-configured receivers which send notifications to Slack channels or PagerDuty. This is where each team needs to have a configured receiver. While this is a N solution where N is the number of teams, we know that there will always be a lot less teams and changes to them when compared to applications.

Now, alerts with team label propagated will be dynamically routed to slack channels of the resource owner teams.

Conclusion

You should now be able to reduce toil and code duplication with dynamically routed Prometheus alerts. Here is a Gist to give you a quick start for converting static alerts into dynamic.