Open Sourcing Incident Management system

3 min read Original article ↗

Mykola Kondratiuk

Press enter or click to view image in full size

We are excited to announce that our incident management system is now open source!

Our incident management system is designed to help teams quickly and effectively respond to and resolve any incidents that may occur, specifically in the tech industry.

Features

It includes features such as incident categorization, incident escalation, and real-time communication tools.

Open Sourcing

By open-sourcing the system, we hope to make it more widely accessible to tech companies and organizations and to encourage collaboration and contributions from the wider community.

We believe that by working together, we can improve the system and make it even more powerful and effective in addressing tech-related incidents.

Getting Started

To help you get started, we have also provided links to our documentation, Linkedin, official website, and a live demo of the system on our website.

We also have a signup form where you can create a new account for free to work with the system.

Also, you can install the platform on your own server or inside Kubernetes

Contribution

If you are interested in contributing to the project, please visit our GitHub repository to learn more and to access the code.

Our team is also available for questions or collaborations at nikolay.k@harpia.io or GitHub Issues.

Subscription-based Support

In addition, we also provide a subscription-based support service for our incident management system.

This includes access to priority support, regular updates, and additional features.

If you are interested in this service, please get in touch with us at nikolay.k@harpia.io for more information and pricing.

Technologies used

We use a combination of technologies such as Vue.js, Python, Aerospike, Kafka, VitoriaMetrics, and MariaDB to build the system.

Architecture Overview

Our incident management system is built on a microservices architecture, utilizing a combination of APIs and event-driven communication.

The system comprises several services that work together to provide the functionality of incident categorization, escalation, and real-time communication.

Each service is designed to be independent and can be deployed independently, allowing for flexibility and scalability.

Press enter or click to view image in full size

1. Technical flow to process alerts:

  • harp-collectors: receive alerts from the monitoring system, unify the structure, and push them to the Kafka topic
  • harp-alert-decorator: read the alert from the Kafka topic (produced by harp-collectors) and add additional info about environments and scenarios that should be applied to the alert
  • harp-daemon: read the alert from Kafka topic (produced by harp-alert-decorator), describe the logic and state of the alert, and write the result to MariaDB
  • harp-aggregator: read alerts from MariaDB, aggregate it, and send them to Aerospike
  • harp-bridge: read alerts from Aerospike and send to UI via WebSockets
  • harp-ui: the main user interface of the platform

2. Additional Services:

3. Platform Monitoring:

  • Prometheus metrics in VictoriaMetrics
  • Traces in Grafana Tempo
  • Logs in Grafana Loki
  • Dashboards and Alerts in Grafana

Conclusion

We look forward to working with you to make our incident management system the best it can be for the water industry!