Settings

Theme

Ask HN: How do AI labs setup their infrastructure to train large models?

3 points by true2octave 2 years ago · 0 comments · 1 min read

Reader

At my company I have to do this task, and so far I have seen slurm-based cluster setup (v100s or h100s), some fast distributed file system, Docker for containers and PyTorch with DDP strategy.

But I read somewhere kubernetes can also be used. And their singularity as Docker alternative.

Where can I learn more about this?

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection