GitHub - torayeff/siara: SIARA: Service Intelligence for Agent Reliability and Availability

SIARA is a monitoring system for AI agents. It evaluates and tracks agent performance over time by running standardized challenges.

The Origin Story

In 2025, my team and I used AI coding tools and LLMs like Gemini and Claude a lot for different kinds of work. Over time, we noticed a strange pattern. At certain points, the model outputs became almost unusable.

From London, the pattern was pretty clear. The models worked well until around noon, but the quality dropped badly in the evening. I suspected it had something to do with developers in San Francisco coming online and some kind of throttling or prioritization kicking in.

That's how the idea for SIARA came up. I wanted something like uptime monitoring, but for AI agents.

The Challenge: Image Puzzle

The hard part was defining a good agent task. Almost any task can be turned into a script once you understand the problem, so it's tricky to measure true agent behavior. The task had to be not too easy, not too hard, and easy to generate without manual curation.

Inspired by image puzzles, I landed on a simple idea: Take any image and split it into N overlapping tiles. The agent's job is to guess the original image dimensions and reconstruct the full image.

This task is:

Easy to generate
Easy to scale by increasing N
Easy to evaluate using an SSIM score

Task Details

The agent is given a set of overlapping image tiles in a directory. All information is preserved; nothing is missing.

Example input tiles:

Tile 0	Tile 1	Tile 2

The Goal:

Determine the original image dimensions
Reconstruct the full original image from the tiles
Submit one single image file within the time limit

Expected output (reconstructed image):

See challenges/image_puzzle/ for the full example:

task/tiles/ - Input tiles
task/README.md - Challenge instructions
solution/image.png - Expected solution

The Solver Agent

The solver agent is a standard LangChain-based agent equipped with tools that mimic a human developer's capabilities:

list_directory - List files and subdirectories
read_text_file - Read UTF-8 text files
read_binary_file - Read binary files as base64
read_image_file - Read and analyze image files (PNG, JPG, GIF, WebP)
write_file - Create or overwrite files
execute_shell - Run shell commands
execute_python - Run Python scripts
submit_solution - Submit a solution file to the verification API

The agent operates in a loop: observing the environment, thinking about the next step, executing tools (like writing Python scripts to stitch images), and observing the output until it decides to submit a solution.

Setup

Install dependencies from requirements.txt:

pip install -r requirements.txt

Looking Ahead

SIARA is still early, but I believe that by 2026, agents will need real monitoring tools instead of just trusting provider-reported results.

If you have any feedback, feel free to reach out.

— Agajan
https://github.com/torayeff