Open Sourcing Incident Management system

Press enter or click to view image in full size

We are excited to announce that our incident management system is now open source!

Our incident management system is designed to help teams quickly and effectively respond to and resolve any incidents that may occur, specifically in the tech industry.

Features

It includes features such as incident categorization, incident escalation, and real-time communication tools.

Open Sourcing

By open-sourcing the system, we hope to make it more widely accessible to tech companies and organizations and to encourage collaboration and contributions from the wider community.

We believe that by working together, we can improve the system and make it even more powerful and effective in addressing tech-related incidents.

Getting Started

To help you get started, we have also provided links to our documentation, Linkedin, official website, and a live demo of the system on our website.

We also have a signup form where you can create a new account for free to work with the system.

Also, you can install the platform on your own server or inside Kubernetes

Contribution

If you are interested in contributing to the project, please visit our GitHub repository to learn more and to access the code.

Our team is also available for questions or collaborations at nikolay.k@harpia.io or GitHub Issues.

Subscription-based Support

In addition, we also provide a subscription-based support service for our incident management system.

This includes access to priority support, regular updates, and additional features.

If you are interested in this service, please get in touch with us at nikolay.k@harpia.io for more information and pricing.

Technologies used

We use a combination of technologies such as Vue.js, Python, Aerospike, Kafka, VitoriaMetrics, and MariaDB to build the system.

Architecture Overview

Our incident management system is built on a microservices architecture, utilizing a combination of APIs and event-driven communication.

The system comprises several services that work together to provide the functionality of incident categorization, escalation, and real-time communication.

Each service is designed to be independent and can be deployed independently, allowing for flexibility and scalability.

Press enter or click to view image in full size

1. Technical flow to process alerts:

harp-collectors: receive alerts from the monitoring system, unify the structure, and push them to the Kafka topic
harp-alert-decorator: read the alert from the Kafka topic (produced by harp-collectors) and add additional info about environments and scenarios that should be applied to the alert
harp-daemon: read the alert from Kafka topic (produced by harp-alert-decorator), describe the logic and state of the alert, and write the result to MariaDB
harp-aggregator: read alerts from MariaDB, aggregate it, and send them to Aerospike
harp-bridge: read alerts from Aerospike and send to UI via WebSockets
harp-ui: the main user interface of the platform

2. Additional Services:

harp-filters: create and manage the user-specific filters in UI
harp-actions: manage alerts — handle, snooze, acknowledge
harp-environments: create and manage environments
harp-bots: configure your own bots to send auto notifications to different channels — Email, SMS, Slack, etc.
harp-integrations: create and manage the integrations with your monitoring systems
harp-licenses: monitor the usage of the alerts and notification channels
harp-scenarios: create and manage scenarios for alerts
harp-users: create and manage users inside the platform, including authentication and authorization
harp-notifications-gmail: responsible for sending auto email notifications
harp-notifications-msteams: responsible for sending auto notifications to Microsoft Teams
harp-notifications-slack: responsible for sending auto notifications to the Slack channel
harp-notifications-sms: responsible for creating auto SMS notifications via Twilio integration
harp-notifications-telegram: responsible for sending auto notifications to Telegram channels
harp-notifications-voice: responsible for creating auto Phone Calls via Twilio integration
harp-clientevents: receive and analyze metrics from the frontend

3. Platform Monitoring:

Prometheus metrics in VictoriaMetrics
Traces in Grafana Tempo
Logs in Grafana Loki
Dashboards and Alerts in Grafana

Conclusion

We look forward to working with you to make our incident management system the best it can be for the water industry!