Supported Models - GroqDocs

2 min read Original article ↗

Explore all available models on GroqCloud.

Note: Production models are intended for use in your production environments. They meet or exceed our high standards for speed, quality, and reliability. Read more here.

MODEL IDSPEED (T/SEC)PRICE PER 1M TOKENSRATE LIMITS (DEVELOPER PLAN)CONTEXT WINDOW (TOKENS)MAX COMPLETION TOKENSMAX FILE SIZE

MetaLlama 3.1 8Bllama-3.1-8b-instant

560

$0.05 input$0.08 output

250K TPM1K RPM

131,072

131,072

-

MetaLlama 3.3 70Bllama-3.3-70b-versatile

280

$0.59 input$0.79 output

300K TPM1K RPM

131,072

32,768

-

MetaLlama Guard 4 12Bmeta-llama/llama-guard-4-12b

1200

$0.20 input$0.20 output

30K TPM100 RPM

131,072

1,024

20 MB

OpenAIGPT OSS 120Bopenai/gpt-oss-120b

500

$0.15 input$0.60 output

250K TPM1K RPM

131,072

65,536

-

OpenAIGPT OSS 20Bopenai/gpt-oss-20b

1000

$0.075 input$0.30 output

250K TPM1K RPM

131,072

65,536

-

OpenAIWhisperwhisper-large-v3

-

$0.111 per hour

200K ASH300 RPM

-

-

100 MB

OpenAIWhisper Large V3 Turbowhisper-large-v3-turbo

-

$0.04 per hour

400K ASH400 RPM

-

-

-

Systems are a collection of models and tools that work together to answer a user query.


MODEL IDSPEED (T/SEC)PRICE PER 1M TOKENSRATE LIMITS (DEVELOPER PLAN)CONTEXT WINDOW (TOKENS)MAX COMPLETION TOKENSMAX FILE SIZE

GroqCompoundgroq/compound

450

-

200K TPM200 RPM

131,072

8,192

-

GroqCompound Minigroq/compound-mini

450

-

200K TPM200 RPM

131,072

8,192

-


Discover how to build powerful applications with real-time web search and code execution

Note: Preview models are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice. Read more about deprecations here.

MODEL IDSPEED (T/SEC)PRICE PER 1M TOKENSRATE LIMITS (DEVELOPER PLAN)CONTEXT WINDOW (TOKENS)MAX COMPLETION TOKENSMAX FILE SIZE

Canopy LabsCanopy Labs Orpheus Arabic Saudicanopylabs/orpheus-arabic-saudi

-

$40.00 per 1M characters

50K TPM250 RPM

200

50,000

-

MetaLlama 4 Maverick 17B 128Emeta-llama/llama-4-maverick-17b-128e-instruct

600

$0.20 input$0.60 output

300K TPM1K RPM

131,072

8,192

20 MB

MetaLlama 4 Scout 17B 16Emeta-llama/llama-4-scout-17b-16e-instruct

750

$0.11 input$0.34 output

300K TPM1K RPM

131,072

8,192

20 MB

MetaLlama Prompt Guard 2 22Mmeta-llama/llama-prompt-guard-2-22m

-

$0.03 input$0.03 output

30K TPM100 RPM

512

512

-

MetaPrompt Guard 2 86Mmeta-llama/llama-prompt-guard-2-86m

-

$0.04 input$0.04 output

30K TPM100 RPM

512

512

-

Moonshot AIKimi K2 0905moonshotai/kimi-k2-instruct-0905

200

$1.00 input$3.00 output

250K TPM1K RPM

262,144

16,384

-

OpenAISafety GPT OSS 20Bopenai/gpt-oss-safeguard-20b

1000

$0.075 input$0.30 output

150K TPM1K RPM

131,072

65,536

-

PlayAIPlayAI TTSplayai-tts

-

$50.00 per 1M characters

50K TPM250 RPM

10,000

8,192

-

PlayAIPlayAI TTS Arabicplayai-tts-arabic

-

$50.00 per 1M characters

50K TPM250 RPM

10,000

8,192

-

Alibaba CloudQwen3-32Bqwen/qwen3-32b

400

$0.29 input$0.59 output

300K TPM1K RPM

131,072

40,960

-

Deprecated models are models that are no longer supported or will no longer be supported in the future. See our deprecation guidelines and deprecated models here.

Hosted models are directly accessible through the GroqCloud Models API endpoint using the model IDs mentioned above. You can use the https://api.groq.com/openai/v1/models endpoint to return a JSON list of all active models:

import requests
import os

api_key = os.environ.get("GROQ_API_KEY")
url = "https://api.groq.com/openai/v1/models"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.get(url, headers=headers)

print(response.json())

Was this page helpful?