AI text improvement through research-backed critique with complete observability
Status: Alpha software (v0.3.0). Functional but early-stage. Best suited for evaluation, experimentation, and development.
Why Sifaka?
The Problem: AI-generated text often needs refinement. How do you know if AI output is good enough? How can you systematically improve it without manual review of every output?
What Sifaka Provides:
- Research-Backed Improvement: Implements peer-reviewed critique techniques (Reflexion, Constitutional AI, Self-Refine, etc.)
- Complete Observability: Full audit trail showing exactly how text improved
- Iterative Refinement: Automatic multi-round critique and improvement cycles
- Provider-Agnostic: Works with OpenAI, Anthropic, Google, Groq
Use Case Example: Generate product descriptions for e-commerce. Sifaka:
- Critiques initial draft for clarity, persuasiveness, SEO
- Iteratively refines through multiple improvement cycles
- Validates against your criteria (length, required keywords, tone)
- Provides complete transparency into every improvement step
Installation
# Clone the repository git clone https://github.com/sifaka-ai/sifaka cd sifaka # Install with uv (recommended) uv pip install -e . # Or with standard pip pip install -e .
Setup
Configure your LLM provider API keys using environment variables or .env file:
# OpenAI (default provider) export OPENAI_API_KEY=sk-... # Or Anthropic export ANTHROPIC_API_KEY=sk-ant-... # Or Google export GOOGLE_API_KEY=... # Or Groq export GROQ_API_KEY=...
Quick Start
from sifaka import improve import asyncio async def main(): result = await improve("Write about renewable energy benefits") print(result.final_text) print(f"\nImprovement score: {result.improvement_score:.2f}") print(f"Iterations: {result.iteration}") asyncio.run(main())
Synchronous API
from sifaka import improve_sync result = improve_sync("Write about renewable energy benefits") print(result.final_text)
Core Features
1. Research-Backed Critics
Sifaka implements peer-reviewed critique techniques from academic research:
| Critic | Best For | Research Paper |
|---|---|---|
| SELF_REFINE | General improvement | Self-Refine (2023) |
| REFLEXION | Learning from mistakes | Reflexion (2023) |
| CONSTITUTIONAL | Safety & ethics | Constitutional AI (2022) |
| SELF_CONSISTENCY | Balanced perspectives | Self-Consistency (2022) |
| SELF_RAG | Fact-checking | Self-RAG (2023) |
| META_REWARDING | Self-evaluation | Meta-Rewarding (2024) |
| N_CRITICS | Multiple perspectives | N-Critics (2023) |
| STYLE | Tone & style | Custom implementation |
2. Complete Observability
result = await improve("Your text") # Access complete audit trail for iteration in result.trace: print(f"Iteration {iteration.number}") print(f" Critique: {iteration.critique}") print(f" Improvement: {iteration.improvement}") print(f" Time: {iteration.processing_time:.2f}s")
3. Provider-Agnostic Design
# OpenAI result = await improve(text, provider="openai", model="gpt-4o-mini") # Anthropic result = await improve(text, provider="anthropic", model="claude-3-5-sonnet") # Google result = await improve(text, provider="google", model="gemini-1.5-flash") # Groq (fast inference) result = await improve(text, provider="groq", model="llama3-8b-8192")
4. Validation & Quality Control
from sifaka.validators import LengthValidator, ContentValidator result = await improve( "Write a product description", validators=[ LengthValidator(min_length=100, max_length=200), ContentValidator(required_terms=["features", "benefits"]) ] )
Usage Examples
Example 1: Basic Improvement
from sifaka import improve result = await improve("AI is important for business.") print(result.final_text) # Output: "Artificial intelligence transforms business operations by automating..."
Example 2: Using Specific Critics
from sifaka import improve from sifaka.core.types import CriticType # Single critic result = await improve( "Explain quantum computing", critics=[CriticType.REFLEXION] ) # Multiple critics result = await improve( "Explain quantum computing", critics=[CriticType.REFLEXION, CriticType.SELF_REFINE] )
Example 3: Style Transformation
from sifaka.critics.style import StyleCritic result = await improve( "We offer comprehensive solutions for your needs.", critics=[StyleCritic( style_description="Casual and friendly", style_examples=["Hey there!", "No worries!"] )] )
Example 4: Fact-Checking with SELF_RAG
result = await improve( "The Great Wall of China is visible from space.", critics=[CriticType.SELF_RAG] ) # Critiques factual accuracy and suggests corrections
Example 5: Safety & Ethics Check
result = await improve( "Guide on pest control methods", critics=[CriticType.CONSTITUTIONAL] ) # Evaluates against safety principles
Example 6: Multiple Perspectives
result = await improve( "Product launch announcement", critics=[CriticType.N_CRITICS] ) # Gets feedback from technical expert, general audience, editor, skeptic perspectives
Example 7: Iteration Control
# More iterations for higher quality result = await improve( "Draft email to client", max_iterations=5 # Default is 3 ) # Force improvements even if validation passes result = await improve( "Good text that passes validation", force_improvements=True )
Example 8: Configuration
from sifaka import Config config = Config( model="gpt-4", temperature=0.7, max_iterations=5, timeout_seconds=120 ) result = await improve("Your text", config=config)
Example 9: Storage Backends
from sifaka.storage.file import FileStorage from sifaka.storage.redis import RedisStorage # File storage result = await improve( "Your text", storage=FileStorage("./results") ) # Redis storage result = await improve( "Your text", storage=RedisStorage("redis://localhost:6379") )
Example 10: Error Handling
from sifaka.core.exceptions import ValidationError, CriticError try: result = await improve(text) except ValidationError as e: print(f"Validation failed: {e}") except CriticError as e: print(f"Critic error: {e}")
Example 11: Batch Processing
import asyncio texts = ["Text 1", "Text 2", "Text 3"] tasks = [improve(text) for text in texts] results = await asyncio.gather(*tasks)
Example 12: Custom Validators
from sifaka.validators import BaseValidator class CustomValidator(BaseValidator): async def validate(self, text: str) -> ValidationResult: # Your custom validation logic passed = "important_keyword" in text.lower() return ValidationResult( validator="custom", passed=passed, message="Must contain 'important_keyword'" ) result = await improve(text, validators=[CustomValidator()])
Example 13: Combining Critics for Comprehensive Review
# Technical accuracy + readability result = await improve( "Technical documentation", critics=[CriticType.REFLEXION, CriticType.STYLE] ) # Safety + factual accuracy result = await improve( "Health advice article", critics=[CriticType.CONSTITUTIONAL, CriticType.SELF_RAG] ) # Comprehensive review result = await improve( "Important business document", critics=[ CriticType.SELF_REFINE, CriticType.N_CRITICS, CriticType.META_REWARDING ] )
Configuration
Environment Variables
# LLM Provider Keys OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... GOOGLE_API_KEY=... GROQ_API_KEY=... # Optional: Default settings SIFAKA_DEFAULT_MODEL=gpt-4o-mini SIFAKA_MAX_ITERATIONS=3 SIFAKA_TEMPERATURE=0.7
Config Object
from sifaka import Config config = Config( # Model settings model="gpt-4", # LLM model to use temperature=0.7, # Creativity (0.0-2.0) max_tokens=1000, # Max response length # Critic settings critic_temperature=0.3, # Lower = more consistent critic_context_window=3, # Previous critiques to consider # Behavior settings max_iterations=3, # Max improvement cycles force_improvements=False, # Improve even if valid timeout_seconds=300, # Overall timeout )
Architecture Overview
┌─────────────────────────────────────────────┐
│ Sifaka Improvement Loop │
└─────────────────────────────────────────────┘
│
▼
┌──────────────────────────┐
│ 1. Generate/Modify │
│ (LLM Provider) │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ 2. Critique │
│ (Critics: Reflexion, │
│ Constitutional, etc) │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ 3. Validate │
│ (Validators: Length, │
│ Content, Custom) │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ 4. Improve │
│ (Apply Suggestions) │
└──────────────────────────┘
│
▼
[Repeat or Return Result]
Key Components
- Core Engine (
core/engine/): Orchestrates improvement loop - Critics (
critics/core/): Research-backed critique implementations - Validators (
validators/): Quality checks and requirements - Storage (
storage/): File and Redis storage backends - Config (
core/config/): Configuration management
FAQ
General Questions
Q: Which LLM providers are supported?
A: OpenAI (GPT-4, GPT-3.5), Anthropic (Claude), Google (Gemini), Groq. Any OpenAI-compatible API also works.
Q: Do I need API keys for all providers?
A: No, only for the provider you want to use. Sifaka auto-detects available providers from environment variables.
Q: Can I use multiple critics at once?
A: Yes! Combine critics for comprehensive review: critics=[CriticType.SELF_REFINE, CriticType.REFLEXION]
Q: How much does it cost?
A: Costs depend on your LLM provider, model choice, text length, iterations, and critic count. Typical improvements cost $0.001-0.01 per text with efficient models (GPT-3.5-turbo, Gemini Flash).
Critic Selection
Q: Which critic should I use?
- General improvement:
SELF_REFINE - Academic/technical:
REFLEXIONorSELF_RAG - Marketing/creative:
STYLE - Safety-critical:
CONSTITUTIONAL - Balanced perspectives:
SELF_CONSISTENCYorN_CRITICS
Q: Can I create custom critics?
A: Yes! Implement the CriticPlugin interface (see examples/ for reference implementations).
Performance
Q: How can I improve performance?
- Use faster models: Gemini Flash or GPT-3.5-turbo
- Reduce iterations: Set
max_iterations=1or2 - Batch processing: Process multiple texts concurrently
- Connection pooling: Automatically enabled
Q: Does Sifaka cache results?
A: Not by default. Use FileStorage or RedisStorage to save results, or implement custom caching.
Troubleshooting
Q: Why am I getting timeout errors?
A: Increase timeout_seconds, reduce max_iterations, or use faster models.
Q: Why isn't my text improving?
A: Try different temperature settings (0.7-0.9), different critics, larger models, or check input text quality.
Q: How do I debug issues?
A: Enable logging: logging.basicConfig(level=logging.DEBUG) or use Logfire integration.
Production Use
Q: Can I use Sifaka in production?
A: Yes, but it's alpha software. Features: error handling, timeouts, connection pooling, storage backends, monitoring integration.
Q: Does Sifaka work with async frameworks?
A: Yes! Fully async, works with FastAPI, aiohttp, Django (async views), and any async Python framework.
Q: Is there a synchronous API?
A: Yes: from sifaka import improve_sync
Development
For developers and contributors, see AGENTS.md for:
- Development setup and workflow
- Critical design patterns
- Code quality standards
- Testing guidelines
- Common development tasks
Quick Commands
# Run tests pytest tests/ # Type checking mypy sifaka/ # Linting ruff check . # Formatting black . # Coverage pytest --cov=sifaka
Roadmap
Phase 1: Core Functionality (v0.3.0) ✅
- PydanticAI 1.14+ integration
- Research-backed critics
- Provider-agnostic design
- Storage backends
- Comprehensive documentation consolidation
Phase 2: Enhanced Critics (v0.4.0)
- More specialized critics
- Custom critic templates
- Critic performance optimization
- Enhanced validation framework
Phase 3: Production Features (v0.5.0)
- Advanced caching strategies
- Distributed processing
- Cost optimization tools
- Performance monitoring
Phase 4: v1.0 Release
- Production-grade stability
- Comprehensive documentation site
- Plugin ecosystem
- Enterprise features
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! This is alpha software under active development.
- Check GitHub Issues for open tasks
- Read AGENTS.md for development guidelines
- Submit PRs with tests and documentation
Research Citations
If you use Sifaka in research, please cite the underlying papers:
@article{madaan2023self, title={Self-Refine: Iterative Refinement with Self-Feedback}, author={Madaan, Aman and others}, journal={arXiv preprint arXiv:2303.17651}, year={2023} } @article{shinn2023reflexion, title={Reflexion: Language Agents with Verbal Reinforcement Learning}, author={Shinn, Noah and others}, journal={arXiv preprint arXiv:2303.11366}, year={2023} } @article{bai2022constitutional, title={Constitutional AI: Harmlessness from AI Feedback}, author={Bai, Yuntao and others}, journal={arXiv preprint arXiv:2212.08073}, year={2022} }