BAML VCR - Cache LLM calls in tests in 1 line
from baml_vcr import baml_vcr import baml_client from baml_client import b class TestMyBAMLFunctions: @baml_vcr.use_cassette() # <--- All you need def test_simple_function(self): # First run: makes real LLM call and saves to cassette result = b.MyBAMLFunction(arg1="value1", arg2="value2") assert result.value == 100 # Subsequent runs: loads from cassette without LLM call
A recording and playback system for BAML function calls, inspired by the VCR.py. BAML VCR allows you to capture LLM interactions during test runs and replay them without making actual API calls, making your tests faster and cost-effective, and giving you the flexibility to choose which parts of an LLM pipeline to test vs cache in any given run.
Source: https://github.com/BoundaryML/baml/tree/canary
Features
- Record & Replay: Capture BAML function calls and their responses, then replay them in subsequent test runs
- Streaming Support: Full support for streaming BAML functions with chunk-by-chunk recording
- Multiple Recording Modes: Choose between "once", "new_episodes", "none", or "all" recording strategies
- Async Support: Works with both synchronous and asynchronous BAML functions
- YAML Storage: Human-readable cassette files stored in YAML format
- Automatic Test Discovery: Cassettes are automatically named based on test class and method names
- Type Preservation: Preserves complex BAML response types during serialization
Installation
BAML VCR is not yet available on PyPI. To install, clone the repository and install from source:
git clone https://github.com/gr-b/baml_vcr.git cd baml_vcr pip install -e .
Or install directly from GitHub:
pip install git+https://github.com/gr-b/baml_vcr.git
Recording Modes
once (default)
- Records interactions if cassette doesn't exist
- Replays from cassette if it exists
- Perfect for standard test scenarios
new_episodes
- Replays existing interactions
- Records any new, unmatched calls
- Useful when adding new test cases
none
- Only replays, never records
- Raises error if interaction not found
- Use in CI/CD pipelines
all
- Always records, overwrites existing cassette
- Useful for refreshing test data
Advanced Usage
Custom Cassette Names
@baml_vcr.use_cassette(cassette_name="custom_test_name") def test_with_custom_name(self): result = b.MyFunction(input="test")
Different Recording Modes
@baml_vcr.use_cassette(record_mode="new_episodes") def test_incremental_recording(self): # Existing calls are replayed result1 = b.Function1(input="test") # New calls are recorded result2 = b.Function2(input="new test")
Streaming Functions
@baml_vcr.use_cassette() async def test_streaming_function(self): stream = b.stream.StreamingFunction(prompt="Generate a story") # First run: records each chunk async for chunk in stream: print(chunk.message) final = await stream.get_final_response() # Subsequent runs: replays chunks with realistic timing
Cassette File Structure
Cassettes are stored in baml_cassettes/ directory next to your test files:
tests/
├── test_my_functions.py
└── baml_cassettes/
├── TestClass_test_method.cassette.yaml
└── TestClass_test_streaming.streaming.cassette.yaml
Example Cassette Content
version: '1.0' interactions: - function_name: ExtractUserInfo args: text: "John Doe, 30 years old, john@example.com" response: _type: UserInfo _module: baml_client.types name: John Doe age: 30 email: john@example.com response_type: baml_client.types.UserInfo usage: input_tokens: 15 output_tokens: 12 is_streaming: false created_at: '2024-01-15T10:30:00.123456'
How It Works
- Interception: BAML VCR patches BAML client functions at runtime
- Recording: When recording is enabled, it uses BAML's Collector to capture function calls and responses
- Storage: Interactions are serialized to YAML, preserving type information
- Playback: On replay, responses are returned from the cassette without making API calls
- Streaming: For streaming functions, individual chunks are recorded and replayed with realistic timing
Best Practices
- Commit Cassettes: Include cassette files in version control for consistent test behavior
- Refresh Periodically: Use
record_mode="all"occasionally to update test data - Separate Test Data: Use different cassettes for different test scenarios
- Review Changes: Check cassette diffs when updating to ensure expected behavior
- CI/CD: Use
record_mode="none"in CI to ensure deterministic tests
Troubleshooting
No Cassette Found
If you see "No recorded response found", either:
- Delete the cassette to re-record
- Change
record_modeto "once" or "all" - Check that the function arguments match exactly
Streaming Issues
Streaming cassettes are saved with .streaming.cassette.yaml extension. Ensure you're not mixing streaming and non-streaming calls in the same test.
Type Errors
BAML VCR preserves type information during serialization. If you encounter type errors, check that your BAML client version matches between recording and playback.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. If it does not break existing functionality, I will merge it.
License
MIT License - see LICENSE file for details