Industry Leading, Open-Source AI | Llama

3 min read Original article ↗
Llama logo

Build on your own terms

Optimized models for easy deployment, cost efficiency, and performance that scale to billions of users.

MODELS

Latest Llama models

The latest models feature native multimodality, advanced reasoning, and industry-leading context windows.

Llama 4

Native multimodality leveraging early fusion to pre-train unlabeled text and vision data enabling a change in intelligence from separate, frozen multimodal weights.

Llama 4 MaverickNatively multimodal for image and text understanding.

  • 10M-token context for long-form work

  • Multimodal text + image understanding

  • For use cases around memory, personalization, and multi-modal applications

More details

Llama 4 ScoutNatively multimodal offering text and visual intelligence

  • Offers single H100 GPU efficiency

  • 10M context window

  • For use cases around long document analysis

More details

Llama 3

The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 3.1, Llama 3.2, Llama 3.3.

Llama 3.3Multilingual open source large language model.

  • Available in 70B

  • Experience 405B performance and quality at a fraction of the cost

  • Built for text-based use cases such as synthetic data generation

More details

Llama 3.2Flexible, cost-effective, and built for edge use cases.

  • 1B & 3B are lightweight and cost-efficient allowing you to run them anywhere

  • 11B & 90B are flexible multimodal models that can reason on high resolution images and output text

More details

Llama 3.1Open-foundation model built for flexibility and control.

  • Available in 8B, 70B, and 405B sizes

  • Capabilities in general knowledge, steerability, math, tool use, and multilingual translation

  • Text summarization, multilingual agents, and coding use cases

More details

Model optimization

Llama 4 capabilities

Llama 4 benchmark

Methodology & Notes1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.2. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.3. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended).

Meta blue bg

Start building

Featured case studies

Llama pattern bg

How Stoque is using Llama

Transforming internal intelligence with Llama. Stoque enabled teams to find insights faster, reduce friction, and work more efficiently at scale.

50%

reduction in repetitive queries for technical support

30%

more administrative and support tasks completed

11%

increase in internal user satisfaction

post image

Llama pattern bg

How Shopify is using Llama

Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.

+76%

higher token throughout than the previous model

97.7%

accurate Macro-F1 score on intent detection

33%

compute cost savings with JSON output

post image

Llama pattern bg

How Stoque is using Llama

Transforming internal intelligence with Llama. Stoque enabled teams to find insights faster, reduce friction, and work more efficiently at scale.

50%

reduction in repetitive queries for technical support

30%

more administrative and support tasks completed

11%

increase in internal user satisfaction

post image

Llama pattern bg

How Shopify is using Llama

Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.

+76%

higher token throughout than the previous model

97.7%

accurate Macro-F1 score on intent detection

33%

compute cost savings with JSON output

post image


More case study categories

SAFETY

Protections in the era of generative AI.

Comprehensive system-level protections proactively identify and mitigate potential risks, empowering developers to more easily deploy generative AI responsibly.

Llama latest