Supported Models - GroqDocs

Explore all available models on GroqCloud.

Featured Models and Systems

Note: Production models are intended for use in your production environments. They meet or exceed our high standards for speed, quality, and reliability. Read more here.

MODEL ID	SPEED (T/SEC)	PRICE PER 1M TOKENS	RATE LIMITS (DEVELOPER PLAN)	CONTEXT WINDOW (TOKENS)	MAX COMPLETION TOKENS	MAX FILE SIZE
Llama 3.1 8Bllama-3.1-8b-instant	560	$0.05 input$0.08 output	250K TPM1K RPM	131,072	131,072	-
Llama 3.3 70Bllama-3.3-70b-versatile	280	$0.59 input$0.79 output	300K TPM1K RPM	131,072	32,768	-
Llama Guard 4 12Bmeta-llama/llama-guard-4-12b	1200	$0.20 input$0.20 output	30K TPM100 RPM	131,072	1,024	20 MB
GPT OSS 120Bopenai/gpt-oss-120b	500	$0.15 input$0.60 output	250K TPM1K RPM	131,072	65,536	-
GPT OSS 20Bopenai/gpt-oss-20b	1000	$0.075 input$0.30 output	250K TPM1K RPM	131,072	65,536	-
Whisperwhisper-large-v3	-	$0.111 per hour	200K ASH300 RPM	-	-	100 MB
Whisper Large V3 Turbowhisper-large-v3-turbo	-	$0.04 per hour	400K ASH400 RPM	-	-	-

Systems are a collection of models and tools that work together to answer a user query.

MODEL ID	SPEED (T/SEC)	PRICE PER 1M TOKENS	RATE LIMITS (DEVELOPER PLAN)	CONTEXT WINDOW (TOKENS)	MAX COMPLETION TOKENS	MAX FILE SIZE
Compoundgroq/compound	450	-	200K TPM200 RPM	131,072	8,192	-
Compound Minigroq/compound-mini	450	-	200K TPM200 RPM	131,072	8,192	-

Discover how to build powerful applications with real-time web search and code execution

Note: Preview models are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice. Read more about deprecations here.

MODEL ID	SPEED (T/SEC)	PRICE PER 1M TOKENS	RATE LIMITS (DEVELOPER PLAN)	CONTEXT WINDOW (TOKENS)	MAX COMPLETION TOKENS	MAX FILE SIZE
Canopy Labs Orpheus Arabic Saudicanopylabs/orpheus-arabic-saudi	-	$40.00 per 1M characters	50K TPM250 RPM	200	50,000	-
Llama 4 Maverick 17B 128Emeta-llama/llama-4-maverick-17b-128e-instruct	600	$0.20 input$0.60 output	300K TPM1K RPM	131,072	8,192	20 MB
Llama 4 Scout 17B 16Emeta-llama/llama-4-scout-17b-16e-instruct	750	$0.11 input$0.34 output	300K TPM1K RPM	131,072	8,192	20 MB
Llama Prompt Guard 2 22Mmeta-llama/llama-prompt-guard-2-22m	-	$0.03 input$0.03 output	30K TPM100 RPM	512	512	-
Prompt Guard 2 86Mmeta-llama/llama-prompt-guard-2-86m	-	$0.04 input$0.04 output	30K TPM100 RPM	512	512	-
Kimi K2 0905moonshotai/kimi-k2-instruct-0905	200	$1.00 input$3.00 output	250K TPM1K RPM	262,144	16,384	-
Safety GPT OSS 20Bopenai/gpt-oss-safeguard-20b	1000	$0.075 input$0.30 output	150K TPM1K RPM	131,072	65,536	-
PlayAI TTSplayai-tts	-	$50.00 per 1M characters	50K TPM250 RPM	10,000	8,192	-
PlayAI TTS Arabicplayai-tts-arabic	-	$50.00 per 1M characters	50K TPM250 RPM	10,000	8,192	-
Qwen3-32Bqwen/qwen3-32b	400	$0.29 input$0.59 output	300K TPM1K RPM	131,072	40,960	-

Deprecated models are models that are no longer supported or will no longer be supported in the future. See our deprecation guidelines and deprecated models here.

Hosted models are directly accessible through the GroqCloud Models API endpoint using the model IDs mentioned above. You can use the https://api.groq.com/openai/v1/models endpoint to return a JSON list of all active models:

import requests
import os

api_key = os.environ.get("GROQ_API_KEY")
url = "https://api.groq.com/openai/v1/models"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.get(url, headers=headers)

print(response.json())

Featured Models and Systems

Was this page helpful?