New Gemini model significantly outperforms others on Chatbot Arena (LMSYS)
lmarena.aiBased on my testing, this model is significantly better than other Gemini models especially with programming/math related tasks. The current Gemini models are pretty useless for anything related to programming/math, but this experiment model puts Gemini ahead of GPT4o, and pretty close to Claude 3.5.
The major problem with Claude 3.5 is you can't have conversation with a large amount of text because you will constantly hit rate limits and it's very annoying.
This model with a 2 million context window is probably the best model right now for programming.
I wish I knew about Google Studio way earlier, I don't understand why Google haven't marketed it? I found out about it through word of mouth.
I feel like it's at the point where I'm not too sure how these rankings impact the my choice of LLM. Every time a new model tops the charts, I'll try them for a bit and go back to claude-3.5-sonnet. Both for coding and day to day questions.
I don't know if I'm just getting used to the claude style of response, or the orangy UI that I kind of find cozy, but I think we need better ways to convey the difference between models.
>orangy UI that I kind of find cozy
Yeah it is strangle cozy. I also can't disassociate claude from jean claude van damme and it make giggle thinking he is helping me code.
I also think it's really cool how he parodies himself. Most of the other martial arts actors from the 80s take themselves way too seriously now, like Steven Seagal who just phones it in in B-movies. Jean Claude van Johnson was awesome.
Claude has been my got to, mainly because of the huge context window. But today, that doesn't seem to be the case, or you hit the rate limit pretty quickly and have to wait a whole day.
Google Studio with it's 2M context window + this experimental version could be a good replacement.
I would bet that Google will also add rate limits once they've burned enough money to attract users
Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.
And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.
You’re right. They can actually do RLHF just using their users. Showing each of them slightly different generations and watching their behavior.
Most of the content they crawl is SEO spam, I'm not sure if it's that helpful for model training
SEO spam is the façade you get to see as a user. The gold is all that you don’t see. Just because they don’t show it on page one doesn’t mean it won’t be useful for training.
I feel like these are test versions of Gemini Pro 2.0. The changes are too foundational to be mere iterations/break date updates for 1.5 Pro.
What is the new Gemini model? 1.5-pro-002?
Here is link to this latest one: https://aistudio.google.com/app/prompts/new_chat?model=gemin...
1.5 Pro-002 came out a couple months ago.
Where’s the info on context length etc? Can’t seem to find the official specs page.
It shows the context length on the AI Studio site
2 million for gemini-exp-1206 32k for the other experimental gemini. I think gemini-exp-1121
Gemini Experimental 1206. It's on aistudio