shahules
- Karma
- 130
- Created
- 4 years ago
Recent Submissions
- 1. ▲ PA bench: Evaluating web agents on real world personal assistant workflows (vibrantlabs.com)
- 2. ▲ PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks (vibrantlabs.com)
- 3. ▲ Show HN: Ragas – Open-source library for evaluating RAG pipelines (github.com)
- 4. ▲ Show HN: Ragas – Open-source library for evals and testing RAG systems (github.com)