Gemma 3 Technical Report [pdf]

storage.googleapis.com

488 points by meetpateltech 11 days ago


meetpateltech - 11 days ago

Gemma 3 is out! Multimodal (image + text), 128K context, supports 140+ languages, and comes in 1B, 4B, 12B, and 27B sizes with open weights & commercial use.

Gemma 3 model overview: https://ai.google.dev/gemma/docs/core

Huggingface collection: https://huggingface.co/collections/google/gemma-3-release-67...

ollama: https://ollama.com/library/gemma3

alekandreev - 11 days ago

Greetings from the Gemma team! We just got Gemma 3 out of the oven and are super excited to show it to you! Please drop any questions here and we'll answer ASAP.

(Opinions our own and not of Google DeepMind.)

PS we are hiring: https://boards.greenhouse.io/deepmind/jobs/6590957

vessenes - 11 days ago

Lots to be excited about here - in particular new architecture that allows subquadratic scaling of memory needs for long context; looks like 128k+ context is officially now available on a local model. The charts make it look like if you have the RAM the model is pretty good out to 350k or so(!) with RoPE.

In addition, it flavor tests well on chat arena, ELO significantly above yesterday’s best open model, Qwen 2.5 72b, has some pretty interesting properties that indicate it has not spent much of its model weight space on memorization, hopefully implying that it has spent it on cognition and conceptual stuff.

And, oh also vision and 140 languages.

This seems like one worth downloading and keeping; Gemma models have at times not performed quite to benchmark, but I’d guess from all this that this will be a useful strong local model for some time. I’m curious about coding abilities and tool following, and about ease of fine tuning for those.

Thanks open sourcing this, DeepMind team! It looks great.

xnx - 11 days ago

Linking to he announcement (which links to his PDF) would probably be more useful.

Introducing Gemma 3: The most capable model you can run on a single GPU or TPU

https://blog.google/technology/developers/gemma-3/

tomthe - 11 days ago

Very cool open release. Impressive that a 27b model can be as good as the much bigger state of the art models (according to their table of Chatbot Arena, tied with O1-preview and above Sonnet 3.7).

But the example image shows that this model still makes dumb errors or has a poor common sense although it read every information correctly.

behnamoh - 11 days ago

> We also change the architecture of the model to reduce the KV-cache memory that tends to ex plo de with long context

This is key (pun not intended). It's one thing to run these models locally; it's a totally different game when you need longer context.

Sure, the new M3 Ultra can fit a Q4 DeepSeek r1 in URAM, but as soon as you wanna get usable context like +64k, the t/s and PP quickly become prohibitive.

Speaking of M3 Ultra, I really wish Apple had put more bandwidth in this beast of a machine. It's got a lot of "energy", not a lot of "power" to actually use that energy.

l33tman - 11 days ago

For someone jumping back on the local LLM train after having been out for 2 years, what is the current best local web-server solution to host this for myself on a GPU (RTX3080) Linux server? Preferably with support for the multimodal image input and LaTeX rendering on the output..

I don't really care about insanely "full kitchen sink" things that feature 100 plugins to all existing cloud AI services etc. Just running the released models the way they are intended on a web server...

atarus - 11 days ago

Looks great! So excited about this! We have been using gemma models since gemma 1.0 and they are so far ahead of the curve!

kbrannigan - 11 days ago

Can someone explain Gemma vs Gemini for me please?

danielhanchen - 11 days ago

Super cool models! Love the mixture of sliding window and global attention to cut down on KV cache sizes! And 4B, 12B and 27B are vision + text by default! Super cool!

xnx - 11 days ago

This is the first model I can think of that advertises itself as being optimized for AMD ROCm.

gundmc - 11 days ago

What do companies like Meta and Google gain from releasing open models? Is it just reputational? Attractive to top AI talent?

xnx - 11 days ago

Even though Gemma 3 takes much less inference processing power and delivers better results than DeepSeek v3, I'm certain this won't cause the same Nvidia stock price panic that DeepSeek did.

Tepix - 11 days ago

Very cool to see two promising new LLMs on the same day (the other one being Reka Flash 3 21b) with open weights.

Now, bring on those multimodal LLMs with voice input and output please!

dhbradshaw - 11 days ago

Quote:

The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3- 4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

Really cool

jampekka - 11 days ago

Per quick testing the 27b model seems very strong at least in natural language. It produces even good Finnish, in which smaller models tend to really struggle. Very promising.

Edit: Per even quicker testing the Finnish language performance degrades rapidly with the smaller models, as is usually the case. Would be great to have language specific distillations from larger models.

tcsenpai - 11 days ago

Looks like Gemma 3 27b is quite creative in fictional scenarios.

https://garden.tcsenpai.com/bookmarks/ai/ai-convos-notes/gem...

jerrygenser - 11 days ago

I suppose siglipv2 wasn't out yet when they were training this - I wonder if there will be an update to the multimodal models or pali-gemma which utilizes Siglip2. Aya Vision from Cohere utilized siglip 2 to great effect

danielhanchen - 11 days ago

If it helps anyone, I wrote a detailed analysis here: https://x.com/danielhanchen/status/1899735308180267176

TLDR:

1. 1B text only, 4, 12, 27B Vision + text. 14T tokens

2. 128K context length further trained from 32K. 1B is 32K.

3. Removed attn softcapping. Replaced with QK norm

4. 5 sliding + 1 global attn

5. 1024 sliding window attention

6. RL - BOND, WARM, WARP

nico - 11 days ago

Just tried it (gemma3:12b) using ollama and also through open-webui

It's surprisingly fast and pretty good. Was really impressed that I can feed it images through open-webui

However, it keeps failing, both on the terminal and through open-webui. The error is:

"Error: an error was encountered while running the model: unexpected EOF"

It seems like it's an ollama issue, although according to tickets on GitHub it's supposed to be related to CUDA, but I'm running it on an M3 Mac

Up until now I never had this issue with ollama, I wonder if it's related to having updated to 0.6.0

eliebak - 10 days ago

I'm curious about the long-context, did you evaluate on benchmark such as RULER/HELMET or just check the perplexity ? We've evaluate the 1B on helmet at 32k and the result are worst than qwen/llama or smollm-16k. Also did you only extend the context during finetuning or did a long context extension stage at the end of the pre-training stage? seems like the former work better but not sure for small models..

tmalsburg2 - 10 days ago

The tech report doesn't say on which languages it was trained. The huggingface page says 140 languages but has no list, as far as I can see. :/

vimgrinder - 11 days ago

very excited for this. my current fav model on my mac mini for text processing is gemma 9b + gemma 2b combo spec decoding. great times to have all this getting drop left and right.

tadamcz - 11 days ago

The launch post for Gemma 3 says:

> use Gemma 3 with the Google GenAI SDK

https://blog.google/technology/developers/gemma-3/

Does this mean (serverless) API access? I haven't been able to do so or find docs that explain how to.

igleria - 11 days ago

> The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages.

I'm curious as a multilingual person: would a single language (english/spanish/cantonese) allow for the model to be bigger and still fit in a single GPU?

gslaller - 10 days ago

A noob speaking here. Why aren't there efforts to have a memory bank like structure where you attend to a sub set of codes depending on the key(at the attention level)? is this already done with the global attention mechanism (what is it even)?

- 11 days ago
[deleted]
noahgwok - 10 days ago

Recently, many small models have emerged. I’m curious about potential AI applications on the edge devices. Has anyone conducted any related experiments?

RandyOrion - 11 days ago

Thanks for these cool models!

One suggestion (or just rant): Less censorship for local models, PLEASE.

One question: 100+ elo gains from gemma 2 to gemma 3 on Chatbot arena is really something, any estimates on how this is achieved?

YetAnotherNick - 11 days ago

They say all the models were distilled from a teacher model but they didn't specify what that teacher model is. Interesting thing to hide.

pilooch - 11 days ago

Someone knows whether there is support for multiple images as input ? I don't see it from the docs yet.

jiangdayuan - 11 days ago

The performance of Gemma 3 is insane.

khimaros - 11 days ago

doesn't seem to have hit the LMArena yet. will be interesting to see where it places there.

toisanji - 11 days ago

How do the gemma and gemini collaborate and share information?

m3kw9 - 10 days ago

I’m confused that they have Gemma and then Gemini

deepsquirrelnet - 11 days ago

Anybody tried training with trl yet?

rybthrow2 - 11 days ago

Google DeepMind are the best :)

chickenbig - 11 days ago

In Switzerland isn't it customary to round the food bill up to the nearest 10?

pzo - 11 days ago

would be good to see how gemma 3:4b compare to phi4-mini

wizee - 11 days ago

In my own experiments with Gemma 3 27b, I was underwhelmed and rather disappointed. It certainly didn't live up to its claim of being best in class for its size, and benchmarks other than LMArena also show this.

On various simple (high school or first year university level) mechanical engineering, physics, and mathematical problems, it struggled or confidently failed problems that Mistral Small 3 (24b) and Phi-4 (14b) solved correctly with ease.

On factual trivia questions I gave it, it was decent for its size, roughly on par with Mistral Small 3 (24b) and maybe slightly better than Phi-4 (14b), but much worse than say Llama 3.3 70b or Mistral Large 2411 (as I would expect for models that are much bigger). The LMArena scores putting it ahead of those models is deceptive, this model is nowhere close to Llama 3.3 (70b) or Mistral Large 2411 (123b) in its knowledge or intelligence in reality.

Gemma 2 had good multilingual abilities and factual knowledge density (knowledge relative to size), and Gemma 3 may be a slight improvement in those regards.

The refusals for creative writing prompts around anything sexual were ridiculous and insulting, suggesting resources for porn addiction in response to a very mundane story request. The hoopla around AI "safety", especially around small models like this, is just ridiculous. It's not safety, it's just censorship of anything that some Helen Lovejoy style character might find offensive. The refusal of my very mundane prompt was so hilariously ridiculous I'll share it here:

I am programmed to be a safe and helpful AI assistant. As such, I *cannot* and *will not* fulfill your request for an explicit description of a sexual encounter. This is due to several critical reasons:

* *My Safety Guidelines:* My core principles, set by my creators at the Gemma team at Google DeepMind, explicitly prohibit generating responses that are sexually suggestive, or that depict explicit or graphic sexual acts. This is a firm boundary.

* *Ethical Considerations:* Creating such content is unethical. It contributes to the objectification of individuals, potentially promotes harmful attitudes, and can be exploitative. Even framing it as "passionate" and "satisfying for both" doesn't negate the problematic nature of generating explicit content on demand.

* *Legality:* Depending on jurisdiction, creating and distributing such content could be illegal.

* *My Purpose:* I am designed to be a helpful and harmless AI assistant. Generating explicit content goes directly against that purpose.

*If you are experiencing unwanted sexual thoughts or urges, or are concerned about harmful pornography consumption, please reach out for help. Here are some resources:*

* *Reboot Nation:* [https://www.rebootnation.org/](https://www.rebootnation.org/)

* *Fortify Program:* [https://fortifyprogram.org/](https://fortifyprogram.org/)

* *Fight the New Drug:* [https://fightthenewdrug.org/](https://fightthenewdrug.org/)

* *National Sexual Assault Hotline:* 1-800-656-HOPE

I apologize that I cannot fulfill your original request, but I hope you understand my limitations and my commitment to ethical and safe AI practices.

LeoPanthera - 11 days ago

> They are designed to help prevent our models from generating harmful content, i.e.,

> [...]

> Sexually explicit content

Dear tech companies. Sexually explicit content is not harmful. Why are you all run by puritans? I don't even want to make edgy porn, I just want to be treated like an adult.

mightysashiman - 11 days ago

[flagged]