Automated SRE Agent
An automated Site Reliability Engineering (SRE) tool that monitors application logs, detects errors, and automates incident management by creating Jira tickets.
π§ Features
- Continuous log monitoring for error detection
- Intelligent detection of on-call employees
- Automatic Jira ticket creation and assignment to appropriate personnel
You can see an end-to-end execution demo in experimental/exp.ipynb.
The script main.py runs continuously, periodically scanning the log file for new errors.
βοΈ Configuration
To get started:
- Copy
.env.exampleto.envand fill in the necessary configuration values.
Additional configuration options:
MONITORING_INTERVALβ Time interval (in seconds) between log checks (default:60)LOG_FILE_PATHβ Path to the log file to monitor
π§ͺ Testing
If you donβt have a log file to test with:
- Use the helper script
utils/random_log_generator.pyto generate synthetic logs. - Or, simply try with the provided sample log file:
output/logs.log
π Footnotes
This project was developed as part of the course CS 595 - TCPS: MLOps for Generative AI.
Special thanks to Professor Santosh Nukavarapu for such interesting project!
