Build on your own terms
Optimized models for easy deployment, cost efficiency, and performance that scale to billions of users.
MODELS
Latest Llama models
The latest models feature native multimodality, advanced reasoning, and industry-leading context windows.
Llama 4
Native multimodality leveraging early fusion to pre-train unlabeled text and vision data enabling a change in intelligence from separate, frozen multimodal weights.
Llama 4 MaverickNatively multimodal for image and text understanding.
10M-token context for long-form work Multimodal text + image understanding
For use cases around memory, personalization, and multi-modal applications
More details
Llama 4 ScoutNatively multimodal offering text and visual intelligence
Offers single H100 GPU efficiency 10M context window For use cases around long document analysis
More details
Llama 3
The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 3.1, Llama 3.2, Llama 3.3.
Llama 3.3Multilingual open source large language model.
Available in 70B Experience 405B performance and quality at a fraction of the cost Built for text-based use cases such as synthetic data generation
More details
Llama 3.2Flexible, cost-effective, and built for edge use cases.
1B & 3B are lightweight and cost-efficient allowing you to run them anywhere 11B & 90B are flexible multimodal models that can reason on high resolution images and output text
More details
Llama 3.1Open-foundation model built for flexibility and control.
Available in 8B, 70B, and 405B sizes Capabilities in general knowledge, steerability, math, tool use, and multilingual translation Text summarization, multilingual agents, and coding use cases
More details
Model optimization
Llama 4 capabilities
Llama 4 benchmark
Methodology & Notes1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.2. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.3. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended).
Start building
Featured case studies
How Stoque is using Llama
Transforming internal intelligence with Llama. Stoque enabled teams to find insights faster, reduce friction, and work more efficiently at scale.
50%
reduction in repetitive queries for technical support
30%
more administrative and support tasks completed
11%
increase in internal user satisfaction

How Shopify is using Llama
Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.
+76%
higher token throughout than the previous model
97.7%
accurate Macro-F1 score on intent detection
33%
compute cost savings with JSON output

How Stoque is using Llama
Transforming internal intelligence with Llama. Stoque enabled teams to find insights faster, reduce friction, and work more efficiently at scale.
50%
reduction in repetitive queries for technical support
30%
more administrative and support tasks completed
11%
increase in internal user satisfaction

How Shopify is using Llama
Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.
+76%
higher token throughout than the previous model
97.7%
accurate Macro-F1 score on intent detection
33%
compute cost savings with JSON output

More case study categories
SAFETY
Protections in the era of generative AI.
Comprehensive system-level protections proactively identify and mitigate potential risks, empowering developers to more easily deploy generative AI responsibly.