New Gemini model significantly outperforms others on Chatbot Arena (LMSYS)

110 points by zopper 2 years ago · 18 comments

Reader

Based on my testing, this model is significantly better than other Gemini models especially with programming/math related tasks. The current Gemini models are pretty useless for anything related to programming/math, but this experiment model puts Gemini ahead of GPT4o, and pretty close to Claude 3.5.

The major problem with Claude 3.5 is you can't have conversation with a large amount of text because you will constantly hit rate limits and it's very annoying.

This model with a 2 million context window is probably the best model right now for programming.

Alifatisk 2 years ago

I wish I knew about Google Studio way earlier, I don't understand why Google haven't marketed it? I found out about it through word of mouth.

chenxi9649 2 years ago

I feel like it's at the point where I'm not too sure how these rankings impact the my choice of LLM. Every time a new model tops the charts, I'll try them for a bit and go back to claude-3.5-sonnet. Both for coding and day to day questions.

I don't know if I'm just getting used to the claude style of response, or the orangy UI that I kind of find cozy, but I think we need better ways to convey the difference between models.

maxglute 2 years ago

>orangy UI that I kind of find cozy
Yeah it is strangle cozy. I also can't disassociate claude from jean claude van damme and it make giggle thinking he is helping me code.
- wkat4242 2 years ago
  
  I also think it's really cool how he parodies himself. Most of the other martial arts actors from the 80s take themselves way too seriously now, like Steven Seagal who just phones it in in B-movies. Jean Claude van Johnson was awesome.

Alifatisk 2 years ago

Claude has been my got to, mainly because of the huge context window. But today, that doesn't seem to be the case, or you hit the rate limit pretty quickly and have to wait a whole day.

Google Studio with it's 2M context window + this experimental version could be a good replacement.

a2128 2 years ago

I would bet that Google will also add rate limits once they've burned enough money to attract users

leobg 2 years ago

Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.

stormfather 2 years ago

And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.
- leobg 2 years ago
  
  You’re right. They can actually do RLHF just using their users. Showing each of them slightly different generations and watching their behavior.
achempion 2 years ago

Most of the content they crawl is SEO spam, I'm not sure if it's that helpful for model training
- leobg 2 years ago
  
  SEO spam is the façade you get to see as a user. The gold is all that you don’t see. Just because they don’t show it on page one doesn’t mean it won’t be useful for training.

jug 2 years ago

I feel like these are test versions of Gemini Pro 2.0. The changes are too foundational to be mere iterations/break date updates for 1.5 Pro.

ralfd 2 years ago

What is the new Gemini model? 1.5-pro-002?

alphabetting 2 years ago

Here is link to this latest one: https://aistudio.google.com/app/prompts/new_chat?model=gemin...
1.5 Pro-002 came out a couple months ago.
- d4rkp4ttern 2 years ago
  
  Where’s the info on context length etc? Can’t seem to find the official specs page.
  - kvn8888 2 years ago
    
    It shows the context length on the AI Studio site
    2 million for gemini-exp-1206 32k for the other experimental gemini. I think gemini-exp-1121
famouswaffles 2 years ago

Gemini Experimental 1206. It's on aistudio

Settings

New Gemini model significantly outperforms others on Chatbot Arena (LMSYS)

Keyboard Shortcuts