BOSS - Multi-Agent LLM Operating System For Offensive Security
BOSS is an intelligent task orchestration system that leverages Large Language Models (LLMs) to coordinate and execute agent-based workflows using Monte Carlo Tree Search (MCTS) and Self-Play for dynamic task planning and optimization.
Core Features
- Intelligent Task Decomposition: Uses LLM-guided MCTS to break down complex tasks into optimal sequences of steps
- Dynamic Agent Selection: Employs LLM analysis to match agent capabilities with task requirements
- Self-Play Optimization: Continuously improves task planning through simulated execution and evaluation
- Real-time Monitoring & Adaptation: Tracks execution progress and optimizes workflows dynamically
- Robust Error Handling: Implements multiple retry strategies with intelligent failure analysis
- Human-in-the-Loop: Recognizes when to escalate tasks for human intervention
- Performance Monitoring: Continuously monitors system health and agent performance
Note: This project is still under development and not all features are fully implemented. Do not use in production.
This project currently focuses on network security related reasoning tasks, but BOSS can be extended to other domains with ease.
Architecture
+--------------------------------+
| BOSS OPERATING SYS |
+--------------------------------+
|
v
+--------------------------------+
| Task Planning System |
| +----------------------------+|
| | Monte Carlo Tree Search ||
| | - Step Generation ||
| | - Agent Selection ||
| | - Path Optimization ||
| +----------------------------+|
| | Self-Play ||
| | - Simulation ||
| | - Evaluation ||
| | - Experience Collection ||
| +----------------------------+|
+--------------------------------+
|
v
+--------------------------------+
| Message Bus (Kafka) |
+--------------------------------+
|
v
+--------------------------------+
| Agent Network |
| - Ping |
| - WHOIS |
| - SSL |
| - REST Test |
| - WebSocket Test |
| - Scan Ports |
| - Get SSL Cert |
| - API Explorer |
| - Conversation |
| - DIG Agent |
+--------------------------------+
|
v
+--------------------------------+
| Result Processing |
| - Performance Evaluation |
| - State Updates |
+--------------------------------+
|
v
+--------------------------------+
| MongoDB |
| - Tasks |
| - Agent Status |
| - Tree States |
+--------------------------------+
MCTS and Self-Play Implementation
Monte Carlo Tree Search (MCTS)
BOSS uses an advanced implementation of MCTS for task planning:
-
Node Structure: Each node represents a task state and contains:
- Task/Step description
- Agent assignment
- Evaluation metrics
- Visit counts and value estimates
- Child nodes and unexplored actions
-
Tree Policy:
- Balances exploration and exploitation using UCT formula
- Dynamically expands nodes based on LLM-generated steps
- Limits tree depth and breadth for computational efficiency
-
Simulation:
- Uses actual agent execution results for state evaluation
- Incorporates LLM-based performance assessment
- Tracks simulation status to handle asynchronous execution
Self-Play Integration
The self-play system:
- Simulates task execution with selected agents
- Evaluates performance using LLM-based criteria
- Collects experience data for optimization
- Updates tree statistics based on execution results
LLM Integration
The LLM serves multiple roles:
- Policy Network: Generates possible next steps and actions
- Value Network: Evaluates state quality and action effectiveness
- Agent Selection: Analyzes agent capabilities for task matching
- Performance Evaluation: Assesses execution results and provides feedback
Quick Start
Local Setup
-
Clone the Repository
-
Build Web Components:
cd web && docker compose build
-
Start Infrastructure Services:
# In root directory docker compose upThis starts:
- Web UI
- Kafka Message Broker
- MongoDB Database
Run
docker compose down -v && docker compose down && docker compose upto clear kafka topics and volumes and restart the services. -
Run BOSS:
Environment Variables
Required environment variables:
- OPENAI_API_KEY: API key for OpenAI services
- MONGODB_URI: MongoDB connection string
- KAFKA_BOOTSTRAP_SERVERS: Kafka broker address
- ANTHROPIC_API_KEY: API key for Anthropic services
Agent Integration
To integrate new agents:
- Create Agent Class:
from boss.wrappers.wrapper_agent import WrapperAgent class WrapperNewAgent(WrapperAgent): def process_task(self, task: Dict) -> Dict[str, Any]: result = { "task_id": task["_id"], "result": "Processed successfully", "metadata": {}, } return result
- Register Agent:
Add to
components_to_startinstart.py:
components_to_start = [ BOSS, WrapperPing, # ... other agents ... WrapperNewAgent, # Your new agent ]
Task States
class TaskState(str, Enum): CREATED = "Created" IN_PROGRESS = "In_Progress" WAITING_FOR_EVALUATION = "Waiting_For_Evaluation" AWAITING_HUMAN = "Awaiting_Human" COMPLETED_STEP = "Completed_Step" COMPLETED_WORKFLOW = "Completed_Workflow" FAILED = "Failed" PENDING_NEXT_STEP = "Pending_Next_Step" PAUSED = "Paused" FINAL_COMPLETION = "Final_Completion"
References
This work builds upon and is inspired by several key papers in the field of LLM reasoning and optimization:
-
LLaMA-Berry: LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
- Introduces novel approaches for mathematical reasoning optimization in LLMs
-
Marco-o1: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
- Paper: arXiv
- Model: Hugging Face - AIDC-AI/Marco-o1
-
LLaVA-o1: LLaVA-o1: Let Vision Language Models Reason Step-by-Step
- Demonstrates step-by-step reasoning capabilities in vision-language models
-
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
- Implementation of Chain-of-Thought reasoning for vision-language tasks
Citation
If you use BOSS in your research or applications, please cite:
@software{boss2024, author = {Stanislav Kirdey}, title = {BOSS: Multi-Agent LLM Operating System For Offensive Security}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/skirdey/boss}}, commit = {253d93f48dfffe51fd7203f596f8ccdfd068fb96}, note = {A multi-agent system leveraging LLMs for orchestrating offensive security tasks} } ## License This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.