ByteDance Seed

2 min read Original article ↗

Overview

The Seed2.0 series model has been officially released, offering three general-purpose agent models of varying sizes—Pro, Lite, and Mini. The general-purpose models in this series deliver a comprehensive upgrade in multimodal understanding, with strengthened LLM and Agent capabilities that enable steady progression in real-world long-horizon tasks. Seed2.0 further expands its capability frontier from competition-level reasoning to research-grade tasks, achieving first-tier industry performance in evaluations on high economic-value and high scientific-value workloads.

Seed2.0 Pro

Focuses on long-chain reasoning and robustness in complex workflows. Optimized for complex scenarios in real-world tasks.

Seed2.0 Lite

Balances output quality and response speed.
Ideal as a general-purpose, production-grade model.

Seed2.0 Mini

Optimized for inference throughput and deployment density. Designed for high concurrency and batch generation scenarios.

Model Performance

Seed2.0 delivers significant enhancements in visual reasoning and perception, and achieves SOTA performance on benchmarks, such as MathVision. For dynamic scenarios, Seed2.0 strengthens its understanding of temporal sequences and motion perception, achieving a leading position in key benchmarks such as MotionBench. Seed2.0 further enhances its instruction-following capabilities and achieves top-tier industry performance when evaluated on complex Agent capabilities.

Multimodal Visual Understanding and Interactive Applications

Seed2.0 can process complex visual inputs and enable real-time interaction and app generation. Whether extracting structured information from images or generating interactive content via visual inputs, Seed2.0 handles tasks fast and reliably.

Steady progression on sophisticated professional tasks

Seed2.0 significantly enhances the performance of its LLM and Agent, maintaining high stability and reliability when executing long-horizon, multi-step instructions.

Evaluation Results

Seed2.0 demonstrates comprehensive improvements over Seed1.8 in LLM, VLM, and Agent task evaluations, and particularly excels in reasoning, complex instruction execution, and multimodal understanding.
Swipe right to view all model evaluation results.

Science

MMLU-Pro

87

87.7

83.6

85.9

84.1

88

89.3

90.1

87.8

HLE (no tool, text only)

32.4

28.2

13.3

29.9

17.6

14.5

23.7

33.3

31.7

SimpleQA Verified

36

24

18.9

36.8

26

29.3

48.6

72.1

65.4

HealthBench

57.7

51.2

30

63.3

62.5

28.7

36.3

37.9

51.6

HealthBench - Hard

29.1

20

15.3

42.0

38.6

10.9

11

15

21.5

SuperGPQA

68.7

67.5

61.6

67.9

60.5

65.5

70.6

73.8

72.7

LPFQA

52.6

50.9

47.2

54.4

50.7

54.9

52.6

51.2

51.6

Encyclo-K

65.7

64.5

52.1

61

53

58

63.3

64.9

60

Math

AIME 2026

94.2

88.3

86.7

93.3

92.5

82.5

92.5

93.3

93.3

AIME 2025

98.3

93

87

99

90.3

87

91.3

95

95.2

HMMT Feb 2025

97.3

90

70

100

93.3

79.2

92.9

97.3

100

HMMT Nov 2025

93.3

86.7

80

100

96.7

81.7

93.3

93.3

96.7

MathArenaApex

20.3

4.7

4.2

18.2

2.1

1

1.6

24.5

17.7

MathArenaApex (shortlist)

82.1

52.6

31.1

80.1

43.4

26

47.4

71.4

71.9

BeyondAIME

86.5

76

69

86

72

57

69

83

82

IMOAnswerBench (no tool)

89.3

81.6

71.6

86.6

72.1

60.7

72.6

83.3

84.4

Code

Codeforces

3020

2233

1644

3148

1985

1485

1701

2726

2727

AetherCode

60.6

41.5

29.8

73.8

42.6

16.4

31.6

57.8

56.1

LiveCodeBench (v6)

87.8

81.7

64.1

87.7

62.6

64

84.8

90.7

84.7

STEM

GPQA Diamond

88.9

85.1

79

92.4

82.1

84.3

86.9

91.9

90.7

Superchem (text-only)

51.6

48

16.2

58

34.8

32.4

43.2

63.2

54.4

BABE

50

50.2

40.4

58.1

49.2

44.7

49.3

51.3

55.2

Phybench

74

73

56

74

60

48

69

80

77

FrontierSci-research

25

18.3

3.3

25

18.3

16.7

21.7

15

11.7

FrontierSci-olympiad

74

70

44

75

69

60

71

73

73

General Reasoning

ARC-AGI-1

85.4

75.7

43.3

89.9

54.5

70.9

84

85

86.9

ARC-AGI-2

37.5

14.8

2.3

57.5

3.5

13.6

29.1

31.1

34.3

KORBench

77.5

77

72.8

79.2

74.2

73

77.4

73.9

76

ProcBench

96.6

92.4

80.1

95

87.5

87.5

92.5

90

90

Long Context Performance

MRCR v2 (8-needle)

54

33.6

51.4

89.4

50.1

47.1

56.2

79.7

79

Graphwalks Bfs (<128k)

68.9

82.5

64.1

98

85.5

80.5

92

79.9

84.2

Graphwalks Parents (<128k)

97.6

100

93

99.7

96.6

99

96.2

99.7

99.7

LongBench v2 (128k)

63.8

59.6

52.3

63.2

56.7

62

65

67.4

64

Frames

84.5

83.4

80.5

84

82.9

78.7

84.7

81.9

83.7

DeR2 Bench

58.2

57.3

46.6

69

50.3

58.9

60.4

66.1

66

CL-Bench

20.8

20

14.8

23.9

25.2

18.1

22.6

15.6

16.1

Multilingual

Global PIQA

92.3

92.1

89.2

93.2

91.6

93.9

93.9

95

95.6

MMMLU

88.1

87.7

81.6

90.3

86.3

89.9

91

91.8

91.8

Disco-X

82

80.3

73

76.3

67.7

70.3

78.6

76.8

71.9

Instruction Following

MultiChallenge

68.3

63.2

61.1

59.5

59

57.3

59

68.7

69.3

COLLIE

93.9

94

91.2

96.9

97.4

77.3

79.8

95

96.5

MARS-Bench

85.6

80.5

62.4

87.9

66.1

72.9

87.7

85.6

84.6

Inverse IFEval

78.9

77.1

69.3

72.3

74.8

69.3

72.4

79.6

80.9

Hallucination

LongFact-Objects

92.9

92.2

87

99.2

99.2

98.8

99

98.1

97.9

LongFact-Concepts

92.8

92.4

91.4

99.7

99.5

98.5

98.8

98.5

98.6

FactScore

71.2

62.4

50.4

91.9

96.1

90.6

91.1

92.6

92

Multimodal Math

MathVista

89.8

89

85.5

83.1

87.7

-

80.6

89.8

-

MathVision

88.8

86.4

78.1

86.8

81.3

-

74.3

86.1

-

DynaMath

68.9

70.5

58.9

70.1

61.5

-

52.5

63.3

-

MathKangaroo

90.5

86.3

79.8

86.9

73.8

-

69.6

84.4*

-

MathCanvas

61.9

61.1

53.2

55.3

53.6

-

52.9

58.8

-

Multimodal STEM

MMMU

85.4

83.7

79.7

83.7

83.4

-

81.6

87

-

MMMU-Pro

78.2

76

71.4

79.5*

73.2

-

70.8

81.0*

-

EMMA

72

65.5

57

69.4

60.9

-

60.4

66.5

-

SFE

55.6

53.4

48.4

50.1

51.2

-

55.8

61.9

-

HiPhO

74.1

72.5

55.8

77.7

58.3

-

81.8

79.1

-

XLRS-Bench (macro)

54.6

53.7

49.9

49.9

39.9

-

50.4

51.7

-

PhyX (openended)

72.1

62.8

65

71.5

65.9

-

61.3

71

-

Visual Puzzles

LogicVista

81.9

79.6

73.8

81

78.3

-

68.9

80.8

-

VPCT

76

73

48

56

61

-

29

90

-

ZeroBench (main)

12

8

7

11

11

-

4

10

-

ZeroBench (sub)

47.6

42.2

36.2

38.9

37.7

-

30.8

42.2

-

ArcAGI1-Image

88.8

80.9

29.8

93.1

31.4

-

75.8

69.4

-

ArcAGI2-Image

43.3

28.3

1.5

54.4

1.3

-

26.1

21.5

-

VisuLogic

47.4

47.3

40.4

37

35.8

-

27.6

39

-

Perception & Recognition

VLMsAreBiased

77.4

74.8

58.4

28

62

-

21.4

50.6*

-

VLMsAreBlind

98.6

97

93.1

84.2

93

-

77.2

97.5

-

VisFactor

36.8

33.4

23.6

33.6

20.4

-

24.5

45.8

-

RealWorldQA

86

81.7

81.6

82.1

78

-

75.9

84.7

-

BabyVision

60.6

57.5

38.7

37.4

30.2

-

16.2

49.7*

-

General VQA

SimpleVQA

71.4

67.2

68.7

54.1

65.4

-

57.9

69.7

-

HallusionBench

68

66

65.1

67.7

63.9

-

65.3

69.9

-

MME-CC

57

50.2

40.8

44.4

43.4

-

25.2

56.9

-

MMStar

83

80.7

79.1

78.2

79.9

-

73.9

83.1

-

MUIRBench

81.8

76.2

78

77.4

78.7

-

78.9

78.2

-

MTVQA

51.1

51.1

50.6

48.5

47.3

-

53.1

50.8

-

WorldVQA

49.9

44

47.6

26.3

40.4

-

36.6

47.5

-

VibeEval

81.4

76.5

76.5

73.1

74

-

70.3

77.7

-

ViVerBench

75.9

80

73.9

74.8

74.6

-

72.4

75.9

-

Pointing & Counting

CountBench

95.5

97.1

95.5

91.2

96.3

-

90.3

97.3

-

FSC-147↓

11.3

11.9

17.3

21.1

13.6

-

20.9

12.1

-

Point-Bench

81.4

79

77

-

76.5

-

-

85.5*

-

2D & 3D Spatial Understanding

BLINK

79.5

75.6

73.4

70.3

74.3

-

68.1

77.1

-

MMSIBench (circular)

32.5

28.3

19.7

26.1

25.8

-

20.2

25.4

-

TreeBench

64.7

64.2

57.3

58.8

58.5

-

53.6

62.7

-

RefSpatialBench

72.6

66.4

55.6

25.5

56.3

-

-

65.5*

-

DA-2K

92.3

90.3

86.4

78.9

90.7

-

70.3

82.1

-

All-Angles

72.1

65.2

61.3

71.5

61.6

-

63.1

73.5

-

ERQA

68.5

65.8

56.3

59.8

58.8

-

48.3

70.5*

-

Document & Chart Understanding

ChartQAPro

71.2

70.3

65.2

67.6

63

-

-

69

-

OCRBenchv2

62.5

62.4

58.5

55.6

52.6

-

55.5

63.3

-

OmniDocBench 1.5 ↓

0.099

0.102

0.11

0.143*

0.106

-

0.153

0.115*

-

CharXiv-DQ

93.5

93.3

91.9

93.8

88

-

92.7

94.4

-

CharXiv-RQ

80.5

79.9

70.8

82.1*

71.4

-

65.5

81.4*

-

Multimodal LongContext Understanding

DUDE

72.4

72.1

68.9

68.2

69.4

-

55.6

70.1

-

MMLongBench

74.8

70.8

66.7

-

72.4

-

-

73.6

-

LongDocURL

74.7

75.1

71.3

-

74.5

-

-

72

-

MMLongBench-Doc

61.4

55.1

48.9

-

57

-

-

59.5

-

Video Knowledge

VideoMMMU

86.9

84.1

80.6

82.7

87.6*

88.1

74.4

-

-

MMVU

78.2

75

69

73.1

76.3

77.9

49.7

-

-

VideoSimpleQA

71.9

66.6

67.7

67.8

71.9

70.7

-

-

-

Video Reasoning

VideoReasonBench

77.8

64.2

40.5

52.8

59.5

61.2

73.8

-

-

Morse-500

37.4

32.2

32.2

29.2

33

32.4

55.4

-

-

VideoHolmes

67.4

63.8

58.6

65.5

64.2

65.6

-

-

-

Minerva

66.5

63.8

54.7

62.4

65

64.4

-

-

-

Motion & Perception

TVBench

75

71.5

70.5

71.5

71.1

69.6

94.8

-

-

ContPhy

67.4

56.1

55.9

54.9

58

60.5

-

-

-

TempCompass

89.6

87

83.7

86.9

88

88.3

97.3

-

-

EgoTempo

71.8

61.8

67.2

67

65.4

58.4

63.2

-

-

MotionBench

75.2

70.9

64.4

70.6

70.3*

68.9

-

-

-

TOMATO

59.9

57.3

47.4

60.8

59.6

60.8

95.2

-

-

+ Thinking with Tracking

65.3

59.2

51.3

61

-

64

95.2

-

-

Long Video

VideoMME

89.5

87.7

81.2

87.8

88.4*

85.2

-

-

-

CGBench

65

59.3

59.2

62.4

65.5

65.3

-

-

-

LongVideoBench

80.3

77.3

74.8

77.4

76.7

74.5

-

-

-

VideoEval-Pro

48

44.3

43.7

45.9

52.7

51.9

-

-

-

LVBench

76.4

73

66.6

73

-

-

-

-

-

Multi Video

CrossVid

60.3

57.7

58.6

57.3

53

48.7

89.2

-

-

Streaming Video

OVBench

69.2

65.5

60.1

65.1

62.7

59.2

-

-

-

LiveSports-3K

78

77.8

73.3

77.5

74.5

71.5

-

-

-

OVOBench

77

76.7

70.4

72.6

70.1

68.7

92.8

-

-

ODVBench

72.5

69.6

65.1

63.5

63.6

56.7

91.4

-

-

ViSpeak

78.5

84

77.5

79

89

86

96

-

-

Coding Agent

Terminal Bench 2.01

55.8

45

36.9

62.4

45.2

60.2

56.9

60

-

SWE-Lancer

49.4

47.1

43.1

48.9

45.7

56.1

44.3

51.7

-

SWE Bench Verified

76.5

73.5

67.9

80

77.2

80.9

76.2

78

-

Multi-SWE-Bench

45.2

41.1

49.3

47.7

47.7

52.8

50.2

59

-

SWE-Bench Pro

46.9

46

51.7

55.6

48.4

55.4

49.7

46.7

-

SWE Multilingual

71.7

64.4

63.8

68.8

64.1

74

72.7

71.1

-

Scicode

48.5

52.4

40.2

49.7

47.9

52.8

57.7

55

-

SWE-Evo

8.5

10.6

10.4

12.5

16.7

27.1

8.9

12.8

-

Aider Polyglot

80

76

79.1

91.1

82.2

92.4

94.2

92

-

ArtifactsBench

66.6

62.6

68.9

71.1

59.1

68.5

58.4

52.5

-

CodeSimpleQA

58

53.1

57.6

62.3

59.6

63

54.7

53.7

-

SpreadsheetBench Verified

79.1

82.3

58.1

69.9

75.9

78.6

70.8

65.7

-

Search Agent

BrowseComp

77.3

72.1

48.1

77.9(65.3)

43.9(29.5)

67.8(57.2)

59.2

41.5

-

BrowseComp-zh

82.4

82

49.5

76.1

42.4

62.4

66.8

63

-

HLE-text

54.2

49.5

35.8

45.5

32

43.2

46.9

47.6

-

HLE-Verified

73.6

70.7

56.4

68.5

37.6

56.6

67.5

71.8

-

WideSearch

74.7

74.5

37.7

76.8

65.1

76.2(71.7)

67.3

64

-

FinSearchComp

70.2

65.1

38.1

73.8

58.6

66.2

52.7

54.8

-

DeepSearchQA

77.4

67.7

16.7

71.3(66.4)

36.3

(76.1)41.6

63.9

54.7

-

Seal-0

49.5

52.3

34.2

51.4

53.4

47.7

45.5

37.8

-

Tool Use

τ2-Bench (retail)

90.4

90.9

-

82

86.2

88.9

85.3

88.6

-

τ2-Bench (telecom)

94.2

92.1

-

98.7

98

98.2

98

94.7

-

MCP-Mark

54.7

46.7

30.2

57.5

32.1

42.3

53.9

40.5

-

BFCL-v4

73.4

72.9

57.9

65.9

72.9

76.5

71

65

-

VitaBench

47

41.8

20.8

41.8

40.8

55.3

48.8

46.7

-

Deep Research

DeepConsult

61.1

60.3

49.8

54.3

55.8

61

48

26

-

DeepResearchBench

53.3

54.4

50.7

52.2

47.2

50.6

49.6

46.1

-

ResearchRubrics

50.7

50.8

43.6

42.3

38.6

45

37.7

36.9

-

Vision Agent

Minedojo Verified

49

39.7

20.3

18.3

-

-

23.3

24.3

-

MM-BrowseComp

48.8

45.1

17.9

26.3

-

-

25

22.8

-

HLE-VL

39.2

35.8

18.7

31

-

-

36

36.8

-

Science Discovery

Scicode

52.1

52.4

40.2

49.7

47.9

52.8

57.7

55

-

FrontierSci-research

23.3

18.3

18.3

25

16.7

21.7

15

11.7

-

Superchem (text-only)

53

48

34.8

58

32.4

43.2

63.2

54.4

-

BIObench

53.5

50.2

49.2

58.1

44.7

49.3

51.3

55.2

-

AInstein Bench

47.7

38.3

35

41.3

33.7

44

42.8

44

-

Vibe Coding

NL2Repo-Bench

27.9

24.6

19.5

49.3

39.9

43.2

34.2

27.6

-

NL2Repo (Pass@1)

3

1

2

8

3

3

4

2

-

ArtifactsBench

66.6

62.6

68.9

71.1

59.1

68.5

58.4

52.5

-

SWE-Bench Pro

46.9

46

51.7

55.6

48.4

55.4

49.7

46.7

-

Terminal Bench 2.01

55.8

45

36.9

62.4

45.2

60.2

54.2

60

-

Context Learning

KORBench

77.2

77

74.2

79.2

73

77.4

73.9

76

-

DeR2 Bench

58.2

57.3

50.3

69

58.9

60.4

66.1

66

-

CL-Bench

21.5

20

25.2

23.9

18.1

22.6

15.6

16.1

-

ToB-Complex Workflows

64.7

68.2

53.4

45

61

64.8

69.2

64.9

-

ToB-Reference Q&A

72.4

62

42.5

63.6

58.9

67.9

68.3

61.3

-

Real World Tasks

HealthBench - Hard

28.3

20

38.6

36.6

10.9

11

15

21.5

-

GDPVal-Diamond

21.3

23.2

11.5

26.9

15.2

20.7

19.4

6.46

-

XPert Bench

64.5

63.3

47.6

53.3

44.7

50.5

53.1

50.1

-

ToB-K12 Education

62.8

63.8

53.7

61.6

50.1

56.2

59.4

59

-

ToB-Compositional Tasks

59.1

54.8

47.5

51.5

57.3

63.6

64.8

57.7

-

ToB-Text Classification

69

64.5

52.7

62.1

64.5

63.9

67.5

61.9

-

ToB-Information Extraction

52

48.4

40.5

44.7

48.3

50.1

49

45.4

-

World Travel (VLM)

12

14

2.7

19.33

2.67

14

8

7.3

-

World Travel (TEXT)

23.3

24

15.3

32.67

10

21.3

14.7

13.3

-

Results marked with an ∗ are sourced from the technical report.

For benchmarks marked with a ‡, we include subtitles for evaluation.
1 Using Terminus2.