Give Me FP32 or Give Me Death?
arxiv.orgPeople often don't understand why LLMs can be non deterministic even with deterministic seeding, temperature, sampling. This paper shows how bad it can be with different hardware and gpu hosts.
Q4 is plenty for me, I dont have the budget for FP32 lol.
If money wasnt a thing, id probably not be going above Q8.