Settings

Theme

My benchmark for large language models

nicholas.carlini.com

4 points by cheviethai123 2 years ago · 2 comments

Reader

cheviethai123OP 2 years ago

Consider how low the score of Gemini here compared to the other LLM test. And I'm impressed by the evaluation method's ability to assess performance without relying on tailored prompts.

  • hoamatcuoi 2 years ago

    But the benchmark only scoring Gemini-Pro 1, I'm curious how the Gemini Ultra performance here but guessed we couldn't know yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection