Generate Evals from debugging LLM Applications
Evals takes a lot of effort to setup, and the results are not always helpful. What if we can generate evals automatically based on how you debug your LLM applications?
Demo
Pixie.-.Generate.Evals.from.debugging.mp4
Get Started
1. Setup
In your project folder, install pixie-sdk package:
Start the local debug server by running:
2. Connect Your Application
Add @pixie.session decorator to any code you'd like to debug, use pixie.print(...) to log data to the debugger UI.
# my_chatbot.py import asyncio from pydantic_ai import Agent import pixie.sdk as pixie # You can implement your application using any major AI development framework agent = Agent( name="Simple chatbot", instructions="You are a helpful assistant.", model="gpt-4o-mini", ) @pixie.session async def my_chatbot(): """Chatbot application example.""" await pixie.print("How can I help you today?") messages = [] while True: user_msg = await asyncio.to_thread(input) await pixie.print(user_msg, from_user=True) response = await agent.run(user_msg, message_history=messages) messages = response.all_messages() await pixie.print(response.output)
3. Debug with web UI
Visit the web UI gopixie.ai to start debugging.
Run your application as normal while pixie debug server is running, and your session would show up in the debugger UI.
Important Links
- Documentation - Complete documentation with tutorials and API reference
- Examples - Real-world examples and sample applications
- Demo - Live Demo with the examples server setup
- Discord - Join our community for support and discussions
Acknowledgments
This project is built on top of many awesome open-source projects:
- Langfuse for instrumentation
- Pydantic for structured data validation
- FastAPI for web API
- Strawberry for graphql
- Uvicorn for web server
- Janus for sync-async queue
- docstring-parser for docstring parsing