MCPMark - Stress-Testing Comprehensive MCP Benchmark

MCP Servers are shaping the future of software. MCPMark is a comprehensive, stress-testing MCP benchmark and a collection of diverse, verifiable tasks designed to evaluate model and agent capabilities in real-world MCP use. MCPmark will continuously update emerging MCP Servers to stay in step with the vibrant ecosystem!

A MCP Benchmark initiated by EVAL SYS

NUS TRAIL × LobeHub

All Models

Open Source Models

Average MCP Benchmark task resolution success rate for top and select models on MCPMark's dataset of 127 tasks

View MCP Benchmark tasks

Showing 127 tasks

PlaywrightReddit

Create sports analytics account, collect NBA player statistics from forum discussions, analyze basketball performance metrics, and compile comprehensive statistical report with community insights.

User Interaction, Data Extraction, Comparative Analysis, Content Submission

Created by Fanqing Meng

2025-08-12

FilesystemDesktop Template

Extract contact details from various file formats on desktop and perform reasoning analysis on the collected relationship data.

Data Extraction, Cross Referencing

Created by Lingjun Chen

2025-08-14

PlaywrightEval Web

Navigate websites with Cloudflare Turnstile protection, handle security challenges, bypass bot detection mechanisms, and successfully access protected content using automated browser interactions.

Created by Allison Zhan

2025-07-27

NotionToronto Guide

Navigate to the Toronto Guide page and change all pink-colored elements to different colors.

Visual Formatting, Conditional Filtering

Created by Xiangyan Liu

2025-08-14

PostgresLego

Create PostgreSQL function to handle inventory part transfers between LEGO sets with validation and audit logging.

Transactional Operations, Stored Procedures And Functions, Audit And Compliance

Created by Jiawei Wang

2025-08-16

GithubMCPMark CI/CD

Set up ESLint workflow for code quality enforcement on all pull requests with proper CI integration.

Ci Cd Automation, Pr Workflows

Created by Zijian Wu

2025-08-15

View all tasks