Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide

3 points by JJneid 2 years ago · 6 comments

Reader

rini17 2 years ago

Note that quantized versions of llama3 70B can be ran on CPU on much cheaper server. I am personally using it via llama.cpp on bare metal 6-core Xeon CPU with 128G RAM for ~50 euro monthly.

JJneidOP 2 years ago

Is inference speed an issue for you?
- rini17 2 years ago
  
  Sufficient for fluent conversation.
JJneidOP 2 years ago

usually performance takes a hit with quantization. are you getting quality responses?
- rini17 2 years ago
  
  Since llama3, yes, quite satisfying.

Settings

Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide

Keyboard Shortcuts