MCP Servers are shaping the future of software. MCPMark is a comprehensive, stress-testing MCP benchmark and a collection of diverse, verifiable tasks designed to evaluate model and agent capabilities in real-world MCP use. MCPmark will continuously update emerging MCP Servers to stay in step with the vibrant ecosystem!
A MCP Benchmark initiated by EVAL SYS
NUS TRAIL × LobeHub
Average MCP Benchmark task resolution success rate for top and select models on MCPMark's dataset of 127 tasks
View MCP Benchmark tasks
Showing 127 tasks
PlaywrightReddit
Create sports analytics account, collect NBA player statistics from forum discussions, analyze basketball performance metrics, and compile comprehensive statistical report with community insights.
User Interaction, Data Extraction, Comparative Analysis, Content Submission
Created by Fanqing Meng
2025-08-12
FilesystemDesktop Template
Extract contact details from various file formats on desktop and perform reasoning analysis on the collected relationship data.
Data Extraction, Cross Referencing
Created by Lingjun Chen
2025-08-14
PlaywrightEval Web
Navigate websites with Cloudflare Turnstile protection, handle security challenges, bypass bot detection mechanisms, and successfully access protected content using automated browser interactions.
Created by Allison Zhan
2025-07-27
NotionToronto Guide
Navigate to the Toronto Guide page and change all pink-colored elements to different colors.
Visual Formatting, Conditional Filtering
Created by Xiangyan Liu
2025-08-14
PostgresLego
Create PostgreSQL function to handle inventory part transfers between LEGO sets with validation and audit logging.
Transactional Operations, Stored Procedures And Functions, Audit And Compliance
Created by Jiawei Wang
2025-08-16
GithubMCPMark CI/CD
Set up ESLint workflow for code quality enforcement on all pull requests with proper CI integration.
Ci Cd Automation, Pr Workflows
Created by Zijian Wu
2025-08-15