240 tokens/s achieved by
@GroqInc's custom chips on Lama 2 Chat (70B) Artificial Analysis has independently benchmarked Groq’s API and now showcases Groq’s latency, throughput & pricing on ArtificialAnalysis.ai This represents a milestone for the application of custom silicon to large language models and AI Groq are serving a full quality FP16 version of Llama 2 Chat (70B) with the model’s full 4k context window See full results here: artificialanalysis.ai/models/llama-2…
