๐ Overview
Cairo CDP is an open-source Customer Data Platform that collects, processes, and routes customer data from any source to any destination. Transform your applications into a comprehensive data ecosystem with real-time event tracking, intelligent routing, and powerful analytics.
๐ Documentation
- ๐ค User Guide - Complete guide for non-technical users
- โก Quick Start - Get started with SDKs in 5 minutes
- ๐ฃ๏ธ Roadmap - Platform evolution and features
- ๐ Technical Docs - API references and advanced guides
๐ Key Features
๐ Universal SDK Support
- Node.js, React/Next.js, and Browser JavaScript SDKs
- Segment-compatible API for easy migration
- TypeScript support with full type definitions
- Event batching, retries, and queue management
๐ฏ Intelligent Routing
- Plugin-based destination architecture
- Pre-built integrations: Slack, Mixpanel, Webhooks
- Custom transformation rules
- Real-time and batch processing
๐ Real-Time Analytics
- Live event debugging with WebSocket streaming
- Modern React dashboard with dark/light themes
- Advanced filtering and search capabilities
- Export functionality for analysis
๐ข Enterprise Ready
- Multi-tenant data segregation via namespaces
- GDPR/CCPA compliant with consent management
- Auto-scaling with intelligent rate limiting
- Comprehensive monitoring and health checks
๐ค AI-Powered Enrichment
- Cost-effective lead enrichment ($0.005/lead)
- Intelligent lead scoring (ICP + behavioral)
- Smart CRM sync for engaged leads only
- Background job processing with status monitoring
๐ฏ How to Use Cairo CDP
For Non-Technical Users
- Access the Dashboard - Open Cairo CDP in your browser
- Monitor Live Events - See customer actions in real-time
- Configure Destinations - Set up Slack notifications and analytics
- Review Analytics - Use charts and reports to understand customer behavior
๐ Complete User Guide - Step-by-step instructions
For Developers
- Install SDK - Choose Node.js, React, or Browser SDK
- Track Events - Add customer action tracking to your app
- Configure Routing - Set up data destinations
- Monitor & Debug - Use real-time debugging tools
๐ SDK Quick Start - Get coding in 5 minutes
๐ Table of Contents
- How to Use
- Quick Start
- Multi-Tenant Namespaces
- API Documentation
- ๐ Full Sync System
- Lead Scoring
- Periodic Sync & Automation
- AI-First Enrichment
- Integrations
- Deployment
- Configuration
- Development
- Monitoring
- Contributing
- License
๐ฏ Quick Start
Prerequisites
- Node.js >= 18.0.0
- PostgreSQL >= 14
- npm or yarn
1. Clone the Repository
git clone git@github.com:outcome-driven-studio/cairo.git
cd cairo
npm install2. Set Up Environment Variables
You can set up your environment variables in two ways:
Option A: Interactive Setup (Recommended)
# Run the interactive setup script npm run setup-env # OR ./setup-env.sh
Option B: Manual Setup
# For local development, use .env.local (recommended) cp .env.example .env.local # Edit .env.local with your actual values # OR use .env (fallback) cp .env.example .env # Edit .env with your actual values
Note: .env.local takes priority over .env for local development. In production/cloud (Railway), environment variables are provided directly and .env files are not used.
Key environment variables you need to configure:
# Database (Required) DATABASE_URL=postgresql://user:password@localhost:5432/cairo # OR for NeonDB: # POSTGRES_URL=postgresql://user:password@host.neon.tech/db?sslmode=require # Lead Enrichment APIs (at least one required for ICP scoring) APOLLO_API_KEY=your_apollo_api_key # Primary enrichment service HUNTER_API_KEY=your_hunter_api_key # Fallback enrichment service # AI Enrichment (Required for AI features) GEMINI_API_KEY=your_gemini_api_key # Required - Google Gemini API key ($0.002/lead) GEMINI_MODEL_PRO=gemini-1.5-pro # Optional - Pro model name (default: gemini-1.5-pro) GEMINI_MODEL_FLASH=gemini-1.5-flash # Optional - Flash model name (default: gemini-1.5-flash) ENABLE_AI_LEAD_SCORING=false # Optional - Enable AI-enhanced lead scoring ENABLE_AI_INSIGHTS=true # Optional - Enable AI insights generation ENABLE_AI_QUERIES=true # Optional - Enable natural language queries # CRM Integration (Required for lead sync) ATTIO_API_KEY=your_attio_api_key # Analytics & Email Marketing (Optional) MIXPANEL_PROJECT_TOKEN=your_mixpanel_token LEMLIST_API_KEY=your_lemlist_api_key SMARTLEAD_API_KEY=your_smartlead_api_key # Server Configuration PORT=8080 NODE_ENV=development # Periodic Sync (Optional - auto-sync every 4 hours) USE_PERIODIC_SYNC=true SYNC_INTERVAL_HOURS=4 MIN_BEHAVIOR_SCORE_FOR_ATTIO=1
For a complete list of all environment variables, see .env.example.
3. Initialize Database
# Run migrations (creates all required tables)
npm run setupThis command will:
- โ
Create all core database tables (
playmaker_user_source,event_source,campaigns, etc.) - โ
Initialize namespace system (
namespacestable with default namespace) - โ Set up proper indexes for performance
- โ Insert default lead scoring configurations
- โ Handle existing tables gracefully (no data loss)
Alternatively, you can run migrations directly:
node src/migrations/run_migrations.js
4. Start the Server
# Development npm run dev # Production npm start
The API will be available at http://localhost:8080
5. Test the Setup
# Check health curl http://localhost:8080/health # Test integrations via API curl -X POST http://localhost:8080/api/test/apollo \ -H "Content-Type: application/json" \ -d '{"email": "test@example.com", "company": "Test Company"}' # Check all service integrations curl http://localhost:8080/api/test/health # Test namespace system curl http://localhost:8080/api/namespaces
๐ข Multi-Tenant Namespaces
Cairo supports multi-tenant data segregation through namespaces, allowing agencies and service providers to separate data for different customers/clients automatically based on campaign keywords.
How It Works
- Campaign Detection: Cairo analyzes campaign names from Lemlist and Smartlead
- Keyword Matching: Matches campaigns against configured keywords per namespace
- Automatic Routing: Routes data to separate
{namespace}_user_sourcetables - Isolated CRM Sync: Each namespace can have its own Attio configuration
Example Use Cases
- Marketing Agency: Separate data for "ACME Corp", "TechStart", "Startup Co" clients
- SaaS Company: Segment data by product lines or customer tiers
- Consulting Firm: Isolate client data for compliance and reporting
Quick Example
# Create a new namespace for ACME Corp curl -X POST http://localhost:8080/api/namespaces \ -H "Content-Type: application/json" \ -d '{ "name": "acme-corp", "keywords": ["ACME", "ACME Corp", "acme-corp"] }' # Now any Lemlist/Smartlead campaigns with "ACME Corp Q1 Campaign" # will automatically route to the acme_corp_user_source table
Namespace Management
| Endpoint | Method | Description |
|---|---|---|
/api/namespaces |
GET | List all namespaces |
/api/namespaces |
POST | Create new namespace |
/api/namespaces/{name} |
GET | Get specific namespace |
/api/namespaces/{name} |
PUT | Update namespace |
/api/namespaces/{name}/stats |
GET | Get namespace statistics |
Key Benefits
โ
Zero Configuration: Works immediately with existing sync processes
โ
Automatic Detection: Smart keyword matching for campaign routing
โ
Complete Isolation: Each customer gets their own database table
โ
Scalable: Add unlimited customer namespaces via API
โ
Backward Compatible: No changes to existing functionality
Examples
# List all namespaces curl http://localhost:8080/api/namespaces # Get namespace statistics curl http://localhost:8080/api/namespaces/acme-corp/stats # Update namespace keywords curl -X PUT http://localhost:8080/api/namespaces/acme-corp \ -H "Content-Type: application/json" \ -d '{ "keywords": ["ACME", "ACME Corp", "acme-corp", "Acme Corporation"] }'
๐ API Documentation
Core Endpoints
Health & Status
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Basic health check |
/health/detailed |
GET | Detailed health with dependencies |
/health/simple |
GET | Simple health check for containers |
Namespace Management
| Endpoint | Method | Description |
|---|---|---|
/api/namespaces |
GET | List all active namespaces |
/api/namespaces |
POST | Create new namespace |
/api/namespaces/{name} |
GET | Get specific namespace details |
/api/namespaces/{name} |
PUT | Update namespace configuration |
/api/namespaces/{name}/stats |
GET | Get namespace usage statistics |
Lead Scoring
| Endpoint | Method | Description |
|---|---|---|
/api/scoring/calculate |
POST | Calculate lead scores for all users |
/api/scoring/sync-to-attio |
POST | Sync existing scores to Attio CRM |
/api/scoring/score-and-sync |
POST | Calculate scores and sync to Attio |
/api/scoring/master-score-all |
POST | Complete pipeline: Import, enrich, score, sync |
Event Tracking
| Endpoint | Method | Description |
|---|---|---|
/api/events/track |
POST | Track single product event |
/api/events/batch |
POST | Track multiple events in batch |
/api/events/identify |
POST | Identify/update user properties |
/api/events/stats |
GET | Get event tracking statistics |
/api/events/health |
GET | Check event tracking service health |
New: Event tracking now supports automatic Slack alerts for important events like signups, payments, and high-value actions. See Event Tracking Guide for configuration.
Periodic Sync Management
| Endpoint | Method | Description |
|---|---|---|
/api/periodic-sync/status |
GET | Get periodic sync status & schedule |
/api/periodic-sync/start |
POST | Start periodic sync |
/api/periodic-sync/stop |
POST | Stop periodic sync |
/api/periodic-sync/sync-now |
POST | Force sync now (supports type param) |
/api/periodic-sync/history |
GET | View sync history |
/api/periodic-sync/schedules |
GET | View sync schedules |
/api/periodic-sync/config |
PUT | Update sync configuration |
Background Jobs & Processing
| Endpoint | Method | Description |
|---|---|---|
/api/jobs |
GET | List all background jobs |
/api/jobs/status/:jobName |
GET | Get specific job status |
/api/jobs/logs/:jobName |
GET | Get job logs |
/api/jobs/stop/:jobName |
POST | Stop running job |
/api/sync/users-background |
POST | Sync users to Attio in background |
/api/sync/events-background |
POST | Sync events to Attio in background |
/api/sync/full-background |
POST | Full sync (users + events) in background |
Data Sync (Legacy & V1)
| Endpoint | Method | Description |
|---|---|---|
/api/v1/sync/lemlist/users |
POST | Sync users from Lemlist |
/api/v1/sync/lemlist/events |
POST | Sync events from Lemlist |
/api/v1/sync/smartlead/users |
POST | Sync users from Smartlead |
/api/v1/sync/smartlead/events |
POST | Sync events from Smartlead |
/initial-sync |
GET | Run initial sync (all sources) |
/delta-sync |
GET | Run delta sync (recent changes) |
/sync-status |
GET | Check sync status |
External Profile Processing
| Endpoint | Method | Description |
|---|---|---|
/api/process-linkedin-profiles |
POST | Process LinkedIn profiles with AI enrichment |
/api/external-profiles/status |
GET | Get external profile processing status |
Webhooks
| Endpoint | Method | Description |
|---|---|---|
/webhook/lemlist |
POST | Receive Lemlist webhook events |
/webhook/smartlead |
POST | Receive Smartlead webhook events |
/api/bridge |
POST | Event bridge: forward to Discord only (not persisted) |
/api/bridge/notion |
POST | Notion automation webhook โ Discord (parses Notion payloads) |
Notion setup and payload types: Notion bridge guide.
Testing & Debugging
| Endpoint | Method | Description |
|---|---|---|
/api/test/apollo |
POST | Test Apollo enrichment |
/api/test/apollo/usage |
GET | Check Apollo credits & rate limits |
/api/test/hunter |
POST | Test Hunter enrichment |
/api/test/enrichment |
POST | Test enrichment with fallback |
/api/test/database |
GET | Test database connection |
/api/test/health |
GET | Check all service integrations |
Dashboard & Stats
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Dashboard UI |
/api/stats |
GET | Get system statistics |
/api/sync/:type |
POST | Run specific sync type |
/api/check/:type |
GET | Check specific service status |
Example Requests
Create and Manage Namespaces
# Create a new namespace for a client curl -X POST http://localhost:8080/api/namespaces \ -H "Content-Type: application/json" \ -d '{ "name": "tech-startup", "keywords": ["TechStartup", "Tech Startup Inc", "TSI"], "attio_config": { "workspace": "tech-startup-workspace" } }' # List all namespaces curl http://localhost:8080/api/namespaces # Get specific namespace details curl http://localhost:8080/api/namespaces/tech-startup # Update namespace keywords curl -X PUT http://localhost:8080/api/namespaces/tech-startup \ -H "Content-Type: application/json" \ -d '{ "keywords": ["TechStartup", "Tech Startup Inc", "TSI", "TechStart"] }' # Get namespace usage statistics curl http://localhost:8080/api/namespaces/tech-startup/stats
Calculate Lead Scores with Enrichment
curl -X POST http://localhost:8080/api/scoring/calculate \ -H "Content-Type: application/json" \ -d '{ "forceReenrich": true, "maxEnrichment": 100, "maxUsers": 500 }'
Track Product Event
curl -X POST http://localhost:8080/api/events/track \ -H "Content-Type: application/json" \ -d '{ "user_email": "user@company.com", "event": "Feature Used", "properties": { "feature": "Export", "format": "CSV" } }'
Force Periodic Sync Now
# Full sync (behavior + ICP + sync to Attio) curl -X POST http://localhost:8080/api/periodic-sync/sync-now \ -H "Content-Type: application/json" \ -d '{"type": "full"}' # Behavior scoring only (no API calls) curl -X POST http://localhost:8080/api/periodic-sync/sync-now \ -H "Content-Type: application/json" \ -d '{"type": "behavior"}' # ICP scoring only (uses AI-first enrichment) curl -X POST http://localhost:8080/api/periodic-sync/sync-now \ -H "Content-Type: application/json" \ -d '{"type": "icp"}'
Check Periodic Sync Status
curl http://localhost:8080/api/periodic-sync/status
Background Job Management
# List all running jobs curl http://localhost:8080/api/jobs # Check specific job status curl http://localhost:8080/api/jobs/status/calculate-lead-scores # Stop a running job curl -X POST http://localhost:8080/api/jobs/stop/calculate-lead-scores
Process External LinkedIn Profiles
curl -X POST http://localhost:8080/api/process-linkedin-profiles \ -H "Content-Type: application/json" \ -d '{ "profiles": [ { "email": "john@company.com", "linkedinUrl": "https://linkedin.com/in/johndoe", "firstName": "John", "lastName": "Doe" } ] }'
Postman Collection
Import the complete API collection for easy testing and development:
๐ฅ Download Postman Collection
What's Included:
- ๐ฅ Health & System - Health checks and monitoring
- ๐ข Namespace Management - Multi-tenant data segregation
- ๐ Dashboard - Dashboard UI and stats endpoints
- ๐ Legacy Sync - Original sync endpoints
- ๐ New Sync API (v1) - Enhanced sync with better performance
- ๐ Full Sync System - Bulk sync with intelligent rate limiting
- โ๏ธ Background Jobs - Asynchronous processing endpoints
- ๐ฅ External Profiles - LinkedIn profile processing
- ๐ฑ Product Events - Event tracking and analytics
- โฐ Periodic Sync - Automated sync scheduling
- ๐งช Testing - API testing and integration validation
- ๐ Scoring - Lead scoring and calculation endpoints
Setup Instructions:
- Import the collection file into Postman
- Update the
base_urlvariable to your deployment URL (default:http://localhost:8080) - Configure environment variables for API keys if testing external integrations
- Each endpoint includes detailed descriptions and example request bodies
๐ Full Sync System
Cairo includes a powerful full synchronization system designed to handle massive data imports and historical syncs from Smartlead and Lemlist while preventing duplicates and maintaining data integrity.
๐ฏ Key Capabilities
- โ Massive Scale - Sync hundreds of thousands of records efficiently
- โ 3 Sync Modes - Full historical, date range, and reset from date
- โ Smart Rate Limiting - API-specific limits prevent quota exhaustion
- โ Deduplication Built-In - Events by key, users by email
- โ Namespace Control - Sync all or specific client partitions
- โ Progress Tracking - Real-time updates with ETA calculations
- โ Background Processing - Async jobs with webhook callbacks
- โ Mixpanel Integration - Automatic analytics tracking
๐ API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/full-sync/execute |
POST | Execute synchronous full sync |
/api/full-sync/execute-async |
POST | Execute asynchronous full sync |
/api/full-sync/status/:jobId |
GET | Get job status and progress |
/api/full-sync/health |
GET | Check full sync system health |
/api/full-sync/config/validate |
POST | Validate sync configuration |
/api/full-sync/namespaces |
GET | Get available namespaces for sync |
/api/full-sync/jobs |
GET | List job history and management |
๐ฎ Sync Modes
1. Full Historical Sync
Syncs all historical data, ignoring last_sync_time timestamps.
curl -X POST http://localhost:8080/api/full-sync/execute-async \ -H "Content-Type: application/json" \ -d '{ "mode": "FULL_HISTORICAL", "platforms": ["smartlead", "lemlist"], "namespaces": "all", "batchSize": 100, "enableMixpanelTracking": true }'
2. Date Range Sync
Syncs data from a specific time period with precise control.
curl -X POST http://localhost:8080/api/full-sync/execute-async \ -H "Content-Type: application/json" \ -d '{ "mode": "DATE_RANGE", "platforms": ["smartlead"], "namespaces": ["playmaker"], "dateRange": { "start": "2024-01-01T00:00:00.000Z", "end": "2024-01-31T23:59:59.999Z" }, "batchSize": 50 }'
3. Reset From Date
Resets sync timestamps and syncs from a specific date forward.
curl -X POST http://localhost:8080/api/full-sync/execute-async \ -H "Content-Type: application/json" \ -d '{ "mode": "RESET_FROM_DATE", "platforms": ["lemlist"], "namespaces": ["client1", "client2"], "resetDate": "2024-02-01T00:00:00.000Z", "batchSize": 75, "rateLimitDelay": 1000 }'
๐ Configuration Options
| Parameter | Type | Description |
|---|---|---|
mode |
String | FULL_HISTORICAL, DATE_RANGE, or RESET_FROM_DATE |
platforms |
Array | ["smartlead"], ["lemlist"], or both |
namespaces |
String/Array | "all" or specific namespaces ["client1", "client2"] |
dateRange |
Object | Required for DATE_RANGE mode: {start, end} |
resetDate |
String | Required for RESET_FROM_DATE mode |
batchSize |
Number | Records per batch (1-1000, default: 100) |
rateLimitDelay |
Number | Milliseconds between requests (default: 500) |
enableMixpanelTracking |
Boolean | Track sync events in Mixpanel |
callbackUrl |
String | Webhook URL for job completion notifications |
๐จ Rate Limits & Performance
The system includes intelligent rate limiting based on API documentation:
| Platform | Requests/Sec | Max Batch Size | Notes |
|---|---|---|---|
| Smartlead | 10 | 100 | Conservative limits |
| Lemlist | 10 | 50 | Respects 20/2sec rule |
| Attio | 5 | 25 | CRM-specific limits |
| Mixpanel | 50 | 200 | Analytics-optimized |
| Database | 100 | 500 | High-performance |
๐ Progress Monitoring
Check Job Status
curl -X GET http://localhost:8080/api/full-sync/status/full-sync-1234567890
Response:
{
"success": true,
"data": {
"id": "full-sync-1234567890",
"status": "running",
"progress": {
"processed": 2500,
"total": 10000,
"percentage": 25,
"eta": "15 minutes"
},
"result": {
"platforms": {
"smartlead": { "users": 1200, "events": 8500 },
"lemlist": { "users": 800, "events": 4200 }
}
}
}
}System Health Check
curl -X GET http://localhost:8080/api/full-sync/health
Job History
curl -X GET "http://localhost:8080/api/full-sync/jobs?limit=20&status=completed"โ๏ธ Configuration Validation
Validate your sync configuration before executing:
curl -X POST http://localhost:8080/api/full-sync/config/validate \ -H "Content-Type: application/json" \ -d '{ "mode": "DATE_RANGE", "platforms": ["smartlead"], "namespaces": ["playmaker"], "dateRange": { "start": "2024-01-01T00:00:00.000Z", "end": "2024-01-31T23:59:59.999Z" } }'
๐ง Production Best Practices
- Start Small: Begin with
batchSize: 25-50for testing - Monitor Health: Use
/api/full-sync/healthfor system monitoring - Use Date Range: For regular syncs, avoid
FULL_HISTORICAL - Namespace Filtering: Sync specific clients instead of "all" when possible
- Async Jobs: Use
/execute-asyncfor large datasets - Rate Limiting: Adjust
rateLimitDelaybased on API responses
๐จ Error Handling & Recovery
The system includes comprehensive error handling:
- Automatic Retry: Failed batches are retried with exponential backoff
- Partial Success: Completed portions are preserved if sync fails
- Rate Limit Recovery: Automatic delay adjustments when limits are hit
- Progress Preservation: Jobs can be resumed from the last successful batch
๐ก Use Cases
Marketing Agency Full Client Onboarding
# Sync all historical data for a new client curl -X POST http://localhost:8080/api/full-sync/execute-async \ -H "Content-Type: application/json" \ -d '{ "mode": "FULL_HISTORICAL", "platforms": ["smartlead", "lemlist"], "namespaces": ["new-client"], "batchSize": 100, "callbackUrl": "https://mycrm.com/webhooks/sync-complete" }'
Data Recovery After Sync Issues
# Reset sync timestamps and re-sync from specific date curl -X POST http://localhost:8080/api/full-sync/execute-async \ -H "Content-Type: application/json" \ -d '{ "mode": "RESET_FROM_DATE", "platforms": ["smartlead", "lemlist"], "namespaces": "all", "resetDate": "2024-01-01T00:00:00.000Z" }'
Monthly Reporting Data Sync
# Sync specific month for reporting curl -X POST http://localhost:8080/api/full-sync/execute-async \ -H "Content-Type: application/json" \ -d '{ "mode": "DATE_RANGE", "platforms": ["smartlead", "lemlist"], "namespaces": "all", "dateRange": { "start": "2024-01-01T00:00:00.000Z", "end": "2024-01-31T23:59:59.999Z" }, "enableMixpanelTracking": true }'
๐ Lead Scoring
How It Works
Cairo uses a dual-scoring system:
Total Lead Score = ICP Score (max 100) + Behavior Score (unlimited)
ICP Score (Company Fit)
Based on Apollo-enriched company data:
| Criteria | Points |
|---|---|
| Company Size | |
| 1-10 employees | 10 |
| 11-50 employees | 30 |
| 51-250 employees | 40 |
| Annual Revenue | |
| $1M - $10M | 20 |
| $10M - $50M | 40 |
| Funding Stage | |
| Seed | 10 |
| Series A | 15 |
| Series B | 20 |
Behavior Score (Engagement)
Based on user actions:
| Event | Points |
|---|---|
| Email Opened | 5 |
| Email Clicked | 5 |
| Email Replied | 10 |
| LinkedIn Message | 10 |
| Website Visit | 5 |
Lead Grades
| Total Score | Grade |
|---|---|
| 90+ | A+ |
| 80-89 | A |
| 70-79 | B+ |
| 60-69 | B |
| 50-59 | C+ |
| 40-49 | C |
| 20-39 | D |
| <20 | F |
โฐ Periodic Sync & Automation
Cairo includes intelligent periodic sync that optimizes API costs while maintaining data freshness.
How It Works
- Every 4 hours: Behavior scoring (database-only, no API calls)
- Weekly: ICP scoring for unscored leads (AI-first enrichment)
- Smart Attio sync: Only sync leads with behavior score > 0
Configuration
Enable periodic sync in your environment:
USE_PERIODIC_SYNC=true SYNC_INTERVAL_HOURS=4 ENABLE_WEEKLY_ICP_SCORING=true ICP_SCORING_DAY=0 # Sunday ICP_SCORING_HOUR=2 # 2 AM MIN_BEHAVIOR_SCORE_FOR_ATTIO=1
Benefits
- Cost Optimized: ICP scoring only for new/unscored leads
- CRM Quality: Only engaged leads enter Attio
- Performance: Behavior scoring processes 1000+ users quickly
- Flexibility: Manual triggers available for all sync types
Manual Control
# Check periodic sync status curl http://localhost:8080/api/periodic-sync/status # Force different sync types curl -X POST http://localhost:8080/api/periodic-sync/sync-now \ -H "Content-Type: application/json" \ -d '{"type": "behavior"}' # or "icp" or "full"
๐ค AI-First Enrichment
Cairo supports cost-effective AI enrichment as the primary method, falling back to traditional APIs when needed.
Cost Comparison
| Method | Cost per Lead | Data Quality | Speed |
|---|---|---|---|
| AI (Perplexity) | $0.005 | High | Fast |
| AI (OpenAI) | $0.01 | High | Fast |
| AI (Anthropic) | $0.008 | High | Fast |
| Hunter.io | $0.08 | Medium | Medium |
| Apollo | $0.15 | Very High | Slow |
How It Works
- AI Primary: Uses LLM to extract company data from web sources
- Confidence Check: AI scores its own confidence (0-100%)
- Smart Fallback: If confidence < 60%, tries Hunter.io then Apollo
- Result: 95%+ cost reduction with comparable data quality
Configuration
# Enable AI enrichment ENABLE_AI_ENRICHMENT=true # AI API keys (at least one required) PERPLEXITY_API_KEY=your_key # Recommended - cheapest OPENAI_API_KEY=your_key ANTHROPIC_API_KEY=your_key # Fallback enrichment HUNTER_API_KEY=your_key # Fallback 1 APOLLO_API_KEY=your_key # Fallback 2
๐ Integrations
Apollo
Used for lead enrichment with company data:
- Employee count
- Annual revenue
- Funding information
- Technologies used
- Company industry
Attio CRM
Syncs lead scores and metadata to custom fields:
icp_score- Company fit scorebehaviour_score- Engagement scorelead_score- Total scoreicp- Letter grade (A+, B, etc.)scoring_meta- JSON metadata
Mixpanel
Tracks all events for analytics:
- User properties sync
- Event tracking with properties
- Real-time analytics
Lemlist & Smartlead
Imports campaign data and tracks engagement:
- Email events (sent, opened, clicked, replied)
- LinkedIn events
- Campaign performance
๐ Deployment
Railway (Recommended)
- Click the button above
- Add environment variables
- Deploy
Docker
# Build image docker build -t cairo . # Run container docker run -p 8080:8080 --env-file .env cairo
Manual Deployment
- Set up PostgreSQL database
- Configure environment variables
- Run migrations
- Start with PM2:
npm install -g pm2 pm2 start server.js --name cairo pm2 save pm2 startup
๐ง Configuration
Database Schema
The system uses these main tables:
playmaker_user_source- Default user profiles with scores{namespace}_user_source- Namespace-specific user tables (auto-created)namespaces- Namespace configurations and keywordsevent_source- All tracked eventscampaigns- Campaign datasent_events- Deduplication trackingscoring_config- Scoring rules
Environment Variables
See .env.example for all available options.
Scoring Configuration
Customize scoring rules in scoring_config table or via API.
๐ ๏ธ Development
Running Tests
# Run test suite npm test
Testing Integrations via API
# Test Apollo enrichment curl -X POST http://localhost:8080/api/test/apollo \ -H "Content-Type: application/json" \ -d '{"email": "test@company.com", "company": "Test Company"}' # Test Hunter enrichment curl -X POST http://localhost:8080/api/test/hunter \ -H "Content-Type: application/json" \ -d '{"email": "test@company.com", "company": "Test Company"}' # Test database connection curl http://localhost:8080/api/test/database # Test all service integrations curl http://localhost:8080/api/test/health
Development Mode
This starts the server with nodemon for auto-reloading and debug logging.
Database Migrations
# Run migrations (recommended) npm run setup # Or run migrations directly node src/migrations/run_migrations.js
Migrations are automatically run when the server starts, but you can run them manually during development.
๐ Monitoring
Health Check
curl http://localhost:8080/health
Job Status
# Check scoring job status curl http://localhost:8080/api/jobs/status/calculate-lead-scores # View job logs curl http://localhost:8080/api/jobs/logs/calculate-lead-scores
Error Tracking
Configure Sentry for production error monitoring:
- Create account at sentry.io
- Add
SENTRY_DSNto environment variables - Errors will be automatically tracked
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Code Style
- Use ESLint for code linting
- Follow existing patterns in the codebase
- Add tests for new features
- Update documentation
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Built with Node.js and Express
- PostgreSQL for data storage
- Apollo.io for enrichment
- Attio for CRM
- Mixpanel for analytics
๐ Support
- ๐ Issues: GitHub Issues
๐บ๏ธ Roadmap
- Add more data sources (HubSpot, Salesforce)
- Machine learning for score optimization
- Custom scoring rules UI
- Data warehouse export (Snowflake, BigQuery)
- Multi-tenant support - Complete namespace-based data segregation
- Full Sync System - Bulk sync with intelligent rate limiting for hundreds of thousands of records
- GraphQL API
- Real-time WebSocket updates
- Namespace-specific dashboard views