Settings

Theme

New Gemini model significantly outperforms others on Chatbot Arena (LMSYS)

lmarena.ai

110 points by zopper a year ago · 18 comments

Reader

impulser_ a year ago

Based on my testing, this model is significantly better than other Gemini models especially with programming/math related tasks. The current Gemini models are pretty useless for anything related to programming/math, but this experiment model puts Gemini ahead of GPT4o, and pretty close to Claude 3.5.

The major problem with Claude 3.5 is you can't have conversation with a large amount of text because you will constantly hit rate limits and it's very annoying.

This model with a 2 million context window is probably the best model right now for programming.

  • Alifatisk a year ago

    I wish I knew about Google Studio way earlier, I don't understand why Google haven't marketed it? I found out about it through word of mouth.

chenxi9649 a year ago

I feel like it's at the point where I'm not too sure how these rankings impact the my choice of LLM. Every time a new model tops the charts, I'll try them for a bit and go back to claude-3.5-sonnet. Both for coding and day to day questions.

I don't know if I'm just getting used to the claude style of response, or the orangy UI that I kind of find cozy, but I think we need better ways to convey the difference between models.

  • maxglute a year ago

    >orangy UI that I kind of find cozy

    Yeah it is strangle cozy. I also can't disassociate claude from jean claude van damme and it make giggle thinking he is helping me code.

    • wkat4242 a year ago

      I also think it's really cool how he parodies himself. Most of the other martial arts actors from the 80s take themselves way too seriously now, like Steven Seagal who just phones it in in B-movies. Jean Claude van Johnson was awesome.

Alifatisk a year ago

Claude has been my got to, mainly because of the huge context window. But today, that doesn't seem to be the case, or you hit the rate limit pretty quickly and have to wait a whole day.

Google Studio with it's 2M context window + this experimental version could be a good replacement.

  • a2128 a year ago

    I would bet that Google will also add rate limits once they've burned enough money to attract users

leobg a year ago

Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.

  • stormfather a year ago

    And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.

    • leobg a year ago

      You’re right. They can actually do RLHF just using their users. Showing each of them slightly different generations and watching their behavior.

  • achempion a year ago

    Most of the content they crawl is SEO spam, I'm not sure if it's that helpful for model training

    • leobg a year ago

      SEO spam is the façade you get to see as a user. The gold is all that you don’t see. Just because they don’t show it on page one doesn’t mean it won’t be useful for training.

jug a year ago

I feel like these are test versions of Gemini Pro 2.0. The changes are too foundational to be mere iterations/break date updates for 1.5 Pro.

ralfd a year ago

What is the new Gemini model? 1.5-pro-002?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection