Industry Leading, Open-Source AI | Llama

Build on your own terms

Optimized models for easy deployment, cost efficiency, and performance that scale to billions of users.

MODELS

Latest Llama models

The latest models feature native multimodality, advanced reasoning, and industry-leading context windows.

Llama 4

Native multimodality leveraging early fusion to pre-train unlabeled text and vision data enabling a change in intelligence from separate, frozen multimodal weights.

Llama 4 MaverickNatively multimodal for image and text understanding.

10M-token context for long-form work
Multimodal text + image understanding
For use cases around memory, personalization, and multi-modal applications

More details

Llama 4 ScoutNatively multimodal offering text and visual intelligence

Offers single H100 GPU efficiency
10M context window
For use cases around long document analysis

More details

Llama 3

The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 3.1, Llama 3.2, Llama 3.3.

Llama 3.3Multilingual open source large language model.

Available in 70B
Experience 405B performance and quality at a fraction of the cost
Built for text-based use cases such as synthetic data generation

More details

Llama 3.2Flexible, cost-effective, and built for edge use cases.

1B & 3B are lightweight and cost-efficient allowing you to run them anywhere
11B & 90B are flexible multimodal models that can reason on high resolution images and output text

More details

Llama 3.1Open-foundation model built for flexibility and control.

Available in 8B, 70B, and 405B sizes
Capabilities in general knowledge, steerability, math, tool use, and multilingual translation
Text summarization, multilingual agents, and coding use cases

More details

Model optimization

Llama 4 capabilities

Llama 4 benchmark

Methodology & Notes1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.2. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.3. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended).

Start building

Featured case studies

How Stoque is using Llama

Transforming internal intelligence with Llama. Stoque enabled teams to find insights faster, reduce friction, and work more efficiently at scale.

50%

reduction in repetitive queries for technical support

30%

more administrative and support tasks completed

11%

increase in internal user satisfaction

How Shopify is using Llama

Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.

+76%

higher token throughout than the previous model

97.7%

accurate Macro-F1 score on intent detection

33%

compute cost savings with JSON output

How Stoque is using Llama

Transforming internal intelligence with Llama. Stoque enabled teams to find insights faster, reduce friction, and work more efficiently at scale.

50%

reduction in repetitive queries for technical support

30%

more administrative and support tasks completed

11%

increase in internal user satisfaction

How Shopify is using Llama

Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.

+76%

higher token throughout than the previous model

97.7%

accurate Macro-F1 score on intent detection

33%

compute cost savings with JSON output

More case study categories

SAFETY

Protections in the era of generative AI.

Comprehensive system-level protections proactively identify and mitigate potential risks, empowering developers to more easily deploy generative AI responsibly.

Build on your own terms

Latest Llama models

Llama 4

Llama 3

Model optimization

Llama 4 capabilities

Llama 4 benchmark

Start building

Featured case studies

How Stoque is using Llama

50%

30%

11%

How Shopify is using Llama

+76%

97.7%

33%

How Stoque is using Llama

50%

30%

11%

How Shopify is using Llama

+76%

97.7%

33%

Protections in the era of generative AI.

Llama latest