ByteDance Seed - NFHN Reader

Overview

The Seed2.0 series model has been officially released, offering three general-purpose agent models of varying sizes—Pro, Lite, and Mini. The general-purpose models in this series deliver a comprehensive upgrade in multimodal understanding, with strengthened LLM and Agent capabilities that enable steady progression in real-world long-horizon tasks. Seed2.0 further expands its capability frontier from competition-level reasoning to research-grade tasks, achieving first-tier industry performance in evaluations on high economic-value and high scientific-value workloads.

Seed2.0 Pro

Focuses on long-chain reasoning and robustness in complex workflows. Optimized for complex scenarios in real-world tasks.

Seed2.0 Lite

Balances output quality and response speed.
Ideal as a general-purpose, production-grade model.

Seed2.0 Mini

Optimized for inference throughput and deployment density. Designed for high concurrency and batch generation scenarios.

Model Performance

Seed2.0 delivers significant enhancements in visual reasoning and perception, and achieves SOTA performance on benchmarks, such as MathVision. For dynamic scenarios, Seed2.0 strengthens its understanding of temporal sequences and motion perception, achieving a leading position in key benchmarks such as MotionBench. Seed2.0 further enhances its instruction-following capabilities and achieves top-tier industry performance when evaluated on complex Agent capabilities.

Multimodal Visual Understanding and Interactive Applications

Seed2.0 can process complex visual inputs and enable real-time interaction and app generation. Whether extracting structured information from images or generating interactive content via visual inputs, Seed2.0 handles tasks fast and reliably.

Steady progression on sophisticated professional tasks

Seed2.0 significantly enhances the performance of its LLM and Agent, maintaining high stability and reliability when executing long-horizon, multi-step instructions.

Evaluation Results

Seed2.0 demonstrates comprehensive improvements over Seed1.8 in LLM, VLM, and Agent task evaluations, and particularly excels in reasoning, complex instruction execution, and multimodal understanding.
Swipe right to view all model evaluation results.

Science

MMLU-Pro

87.7

83.6

85.9

84.1

89.3

90.1

87.8

HLE (no tool, text only)

32.4

28.2

13.3

29.9

17.6

14.5

23.7

33.3

31.7

SimpleQA Verified

18.9

36.8

29.3

48.6

72.1

65.4

HealthBench

57.7

51.2

63.3

62.5

28.7

36.3

37.9

51.6

HealthBench - Hard

29.1

15.3

42.0

38.6

10.9

21.5

SuperGPQA

68.7

67.5

61.6

67.9

60.5

65.5

70.6

73.8

72.7

LPFQA

52.6

50.9

47.2

54.4

50.7

54.9

52.6

51.2

51.6

Encyclo-K

65.7

64.5

52.1

63.3

64.9

Math

AIME 2026

94.2

88.3

86.7

93.3

92.5

82.5

92.5

93.3

AIME 2025

98.3

90.3

91.3

95.2

HMMT Feb 2025

97.3

100

93.3

79.2

92.9

97.3

100

HMMT Nov 2025

93.3

86.7

100

96.7

81.7

93.3

96.7

MathArenaApex

20.3

4.7

4.2

18.2

2.1

1.6

24.5

17.7

MathArenaApex (shortlist)

82.1

52.6

31.1

80.1

43.4

47.4

71.4

71.9

BeyondAIME

86.5

IMOAnswerBench (no tool)

89.3

81.6

71.6

86.6

72.1

60.7

72.6

83.3

84.4

Code

Codeforces

3020

2233

1644

3148

1985

1485

1701

2726

2727

AetherCode

60.6

41.5

29.8

73.8

42.6

16.4

31.6

57.8

56.1

LiveCodeBench (v6)

87.8

81.7

64.1

87.7

62.6

84.8

90.7

84.7

STEM

GPQA Diamond

88.9

85.1

92.4

82.1

84.3

86.9

91.9

90.7

Superchem (text-only)

51.6

16.2

34.8

32.4

43.2

63.2

54.4

BABE

50.2

40.4

58.1

49.2

44.7

49.3

51.3

55.2

Phybench

FrontierSci-research

18.3

3.3

18.3

16.7

21.7

11.7

FrontierSci-olympiad

General Reasoning

ARC-AGI-1

85.4

75.7

43.3

89.9

54.5

70.9

86.9

ARC-AGI-2

37.5

14.8

2.3

57.5

3.5

13.6

29.1

31.1

34.3

KORBench

77.5

72.8

79.2

74.2

77.4

73.9

ProcBench

96.6

92.4

80.1

87.5

92.5

Long Context Performance

MRCR v2 (8-needle)

33.6

51.4

89.4

50.1

47.1

56.2

79.7

Graphwalks Bfs (<128k)

68.9

82.5

64.1

85.5

80.5

79.9

84.2

Graphwalks Parents (<128k)

97.6

100

99.7

96.6

96.2

99.7

LongBench v2 (128k)

63.8

59.6

52.3

63.2

56.7

67.4

Frames

84.5

83.4

80.5

82.9

78.7

84.7

81.9

83.7

DeR² Bench

58.2

57.3

46.6

50.3

58.9

60.4

66.1

CL-Bench

20.8

14.8

23.9

25.2

18.1

22.6

15.6

16.1

Multilingual

Global PIQA

92.3

92.1

89.2

93.2

91.6

93.9

95.6

MMMLU

88.1

87.7

81.6

90.3

86.3

89.9

91.8

Disco-X

80.3

76.3

67.7

70.3

78.6

76.8

71.9

Instruction Following

MultiChallenge

68.3

63.2

61.1

59.5

57.3

68.7

69.3

COLLIE

93.9

91.2

96.9

97.4

77.3

79.8

96.5

MARS-Bench

85.6

80.5

62.4

87.9

66.1

72.9

87.7

85.6

84.6

Inverse IFEval

78.9

77.1

69.3

72.3

74.8

69.3

72.4

79.6

80.9

Hallucination

LongFact-Objects

92.9

92.2

99.2

98.8

98.1

97.9

LongFact-Concepts

92.8

92.4

91.4

99.7

99.5

98.5

98.8

98.5

98.6

FactScore

71.2

62.4

50.4

91.9

96.1

90.6

91.1

92.6

Multimodal Math

MathVista

89.8

85.5

83.1

87.7

80.6

89.8

MathVision

88.8

86.4

78.1

86.8

81.3

74.3

86.1

DynaMath

68.9

70.5

58.9

70.1

61.5

52.5

63.3

MathKangaroo

90.5

86.3

79.8

86.9

73.8

69.6

84.4*

MathCanvas

61.9

61.1

53.2

55.3

53.6

52.9

58.8

Multimodal STEM

MMMU

85.4

83.7

79.7

83.7

83.4

81.6

MMMU-Pro

78.2

71.4

79.5*

73.2

70.8

81.0*

EMMA

65.5

69.4

60.9

60.4

66.5

SFE

55.6

53.4

48.4

50.1

51.2

55.8

61.9

HiPhO

74.1

72.5

55.8

77.7

58.3

81.8

79.1

XLRS-Bench (macro)

54.6

53.7

49.9

39.9

50.4

51.7

PhyX (openended)

72.1

62.8

71.5

65.9

61.3

Visual Puzzles

LogicVista

81.9

79.6

73.8

78.3

68.9

80.8

VPCT

ZeroBench (main)

ZeroBench (sub)

47.6

42.2

36.2

38.9

37.7

30.8

42.2

ArcAGI1-Image

88.8

80.9

29.8

93.1

31.4

75.8

69.4

ArcAGI2-Image

43.3

28.3

1.5

54.4

1.3

26.1

21.5

VisuLogic

47.4

47.3

40.4

35.8

27.6

Perception & Recognition

VLMsAreBiased

77.4

74.8

58.4

21.4

50.6*

VLMsAreBlind

98.6

93.1

84.2

77.2

97.5

VisFactor

36.8

33.4

23.6

33.6

20.4

24.5

45.8

RealWorldQA

81.7

81.6

82.1

75.9

84.7

BabyVision

60.6

57.5

38.7

37.4

30.2

16.2

49.7*

General VQA

SimpleVQA

71.4

67.2

68.7

54.1

65.4

57.9

69.7

HallusionBench

65.1

67.7

63.9

65.3

69.9

MME-CC

50.2

40.8

44.4

43.4

25.2

56.9

MMStar

80.7

79.1

78.2

79.9

73.9

83.1

MUIRBench

81.8

76.2

77.4

78.7

78.9

78.2

MTVQA

51.1

50.6

48.5

47.3

53.1

50.8

WorldVQA

49.9

47.6

26.3

40.4

36.6

47.5

VibeEval

81.4

76.5

73.1

70.3

77.7

ViVerBench

75.9

73.9

74.8

74.6

72.4

75.9

Pointing & Counting

CountBench

95.5

97.1

95.5

91.2

96.3

90.3

97.3

FSC-147↓

11.3

11.9

17.3

21.1

13.6

20.9

12.1

Point-Bench

81.4

76.5

85.5*

2D & 3D Spatial Understanding

BLINK

79.5

75.6

73.4

70.3

74.3

68.1

77.1

MMSIBench (circular)

32.5

28.3

19.7

26.1

25.8

20.2

25.4

TreeBench

64.7

64.2

57.3

58.8

58.5

53.6

62.7

RefSpatialBench

72.6

66.4

55.6

25.5

56.3

65.5*

DA-2K

92.3

90.3

86.4

78.9

90.7

70.3

82.1

All-Angles

72.1

65.2

61.3

71.5

61.6

63.1

73.5

ERQA

68.5

65.8

56.3

59.8

58.8

48.3

70.5*

Document & Chart Understanding

ChartQAPro

71.2

70.3

65.2

67.6

OCRBenchv2

62.5

62.4

58.5

55.6

52.6

55.5

63.3

OmniDocBench 1.5 ↓

0.099

0.102

0.11

0.143*

0.106

0.153

0.115*

CharXiv-DQ

93.5

93.3

91.9

93.8

92.7

94.4

CharXiv-RQ

80.5

79.9

70.8

82.1*

71.4

65.5

81.4*

Multimodal LongContext Understanding

DUDE

72.4

72.1

68.9

68.2

69.4

55.6

70.1

MMLongBench

74.8

70.8

66.7

72.4

73.6

LongDocURL

74.7

75.1

71.3

74.5

MMLongBench-Doc

61.4

55.1

48.9

59.5

Video Knowledge

VideoMMMU

86.9

84.1

80.6

82.7

87.6*

88.1

74.4

MMVU

78.2

73.1

76.3

77.9

49.7

VideoSimpleQA

71.9

66.6

67.7

67.8

71.9

70.7

Video Reasoning

VideoReasonBench

77.8

64.2

40.5

52.8

59.5

61.2

73.8

Morse-500

37.4

32.2

29.2

32.4

55.4

VideoHolmes^‡

67.4

63.8

58.6

65.5

64.2

65.6

Minerva^‡

66.5

63.8

54.7

62.4

64.4

Motion & Perception

TVBench

71.5

70.5

71.5

71.1

69.6

94.8

ContPhy

67.4

56.1

55.9

54.9

60.5

TempCompass

89.6

83.7

86.9

88.3

97.3

EgoTempo

71.8

61.8

67.2

65.4

58.4

63.2

MotionBench

75.2

70.9

64.4

70.6

70.3*

68.9

TOMATO

59.9

57.3

47.4

60.8

59.6

60.8

95.2

+ Thinking with Tracking

65.3

59.2

51.3

95.2

Long Video

VideoMME^‡

89.5

87.7

81.2

87.8

88.4*

85.2

CGBench

59.3

59.2

62.4

65.5

65.3

LongVideoBench

80.3

77.3

74.8

77.4

76.7

74.5

VideoEval-Pro

44.3

43.7

45.9

52.7

51.9

LVBench

76.4

66.6

Multi Video

CrossVid

60.3

57.7

58.6

57.3

48.7

89.2

Streaming Video

OVBench

69.2

65.5

60.1

65.1

62.7

59.2

LiveSports-3K

77.8

73.3

77.5

74.5

71.5

OVOBench

76.7

70.4

72.6

70.1

68.7

92.8

ODVBench

72.5

69.6

65.1

63.5

63.6

56.7

91.4

ViSpeak

78.5

77.5

Coding Agent

Terminal Bench 2.0¹

55.8

36.9

62.4

45.2

60.2

56.9

SWE-Lancer

49.4

47.1

43.1

48.9

45.7

56.1

44.3

51.7

SWE Bench Verified

76.5

73.5

67.9

77.2

80.9

76.2

Multi-SWE-Bench

45.2

41.1

49.3

47.7

52.8

50.2

SWE-Bench Pro

46.9

51.7

55.6

48.4

55.4

49.7

46.7

SWE Multilingual

71.7

64.4

63.8

68.8

64.1

72.7

71.1

Scicode

48.5

52.4

40.2

49.7

47.9

52.8

57.7

SWE-Evo

8.5

10.6

10.4

12.5

16.7

27.1

8.9

12.8

Aider Polyglot

79.1

91.1

82.2

92.4

94.2

ArtifactsBench

66.6

62.6

68.9

71.1

59.1

68.5

58.4

52.5

CodeSimpleQA

53.1

57.6

62.3

59.6

54.7

53.7

SpreadsheetBench Verified

79.1

82.3

58.1

69.9

75.9

78.6

70.8

65.7

Search Agent

BrowseComp

77.3

72.1

48.1

77.9（65.3）

43.9（29.5）

67.8（57.2）

59.2

41.5

BrowseComp-zh

82.4

49.5

76.1

42.4

62.4

66.8

HLE-text

54.2

49.5

35.8

45.5

43.2

46.9

47.6

HLE-Verified

73.6

70.7

56.4

68.5

37.6

56.6

67.5

71.8

WideSearch

74.7

74.5

37.7

76.8

65.1

76.2（71.7）

67.3

FinSearchComp

70.2

65.1

38.1

73.8

58.6

66.2

52.7

54.8

DeepSearchQA

77.4

67.7

16.7

71.3（66.4）

36.3

（76.1）41.6

63.9

54.7

Seal-0

49.5

52.3

34.2

51.4

53.4

47.7

45.5

37.8

Tool Use

τ²-Bench (retail)

90.4

90.9

86.2

88.9

85.3

88.6

τ²-Bench (telecom)

94.2

92.1

98.7

98.2

94.7

MCP-Mark

54.7

46.7

30.2

57.5

32.1

42.3

53.9

40.5

BFCL-v4

73.4

72.9

57.9

65.9

72.9

76.5

VitaBench

41.8

20.8

41.8

40.8

55.3

48.8

46.7

Deep Research

DeepConsult

61.1

60.3

49.8

54.3

55.8

DeepResearchBench

53.3

54.4

50.7

52.2

47.2

50.6

49.6

46.1

ResearchRubrics

50.7

50.8

43.6

42.3

38.6

37.7

36.9

Vision Agent

Minedojo Verified

39.7

20.3

18.3

23.3

24.3

MM-BrowseComp

48.8

45.1

17.9

26.3

22.8

HLE-VL

39.2

35.8

18.7

36.8

Science Discovery

Scicode

52.1

52.4

40.2

49.7

47.9

52.8

57.7

FrontierSci-research

23.3

18.3

16.7

21.7

11.7

Superchem (text-only)

34.8

32.4

43.2

63.2

54.4

BIObench

53.5

50.2

49.2

58.1

44.7

49.3

51.3

55.2

AInstein Bench

47.7

38.3

41.3

33.7

42.8

Vibe Coding

NL2Repo-Bench

27.9

24.6

19.5

49.3

39.9

43.2

34.2

27.6

NL2Repo (Pass@1)

ArtifactsBench

66.6

62.6

68.9

71.1

59.1

68.5

58.4

52.5

SWE-Bench Pro

46.9

51.7

55.6

48.4

55.4

49.7

46.7

Terminal Bench 2.0¹

55.8

36.9

62.4

45.2

60.2

54.2

Context Learning

KORBench

77.2

74.2

79.2

77.4

73.9

DeR² Bench

58.2

57.3

50.3

58.9

60.4

66.1

CL-Bench

21.5

25.2

23.9

18.1

22.6

15.6

16.1

ToB-Complex Workflows

64.7

68.2

53.4

64.8

69.2

64.9

ToB-Reference Q&A

72.4

42.5

63.6

58.9

67.9

68.3

61.3

Real World Tasks

HealthBench - Hard

28.3

38.6

36.6

10.9

21.5

GDPVal-Diamond

21.3

23.2

11.5

26.9

15.2

20.7

19.4

6.46

XPert Bench

64.5

63.3

47.6

53.3

44.7

50.5

53.1

50.1

ToB-K12 Education

62.8

63.8

53.7

61.6

50.1

56.2

59.4

ToB-Compositional Tasks

59.1

54.8

47.5

51.5

57.3

63.6

64.8

57.7

ToB-Text Classification

64.5

52.7

62.1

64.5

63.9

67.5

61.9

ToB-Information Extraction

48.4

40.5

44.7

48.3

50.1

45.4

World Travel (VLM)

2.7

19.33

2.67

7.3

World Travel (TEXT)

23.3

15.3

32.67

21.3

14.7

13.3

Results marked with an ∗ are sourced from the technical report.

For benchmarks marked with a ‡, we include subtitles for evaluation.
1 Using Terminus2.