Run Deepseek from fast NVMe drives

5 points by ironbound a year ago · 4 comments

Reader

ironboundOP a year ago

Testing extreme NVME offload (4 x Gen5x4) for DeepSeek R1Because PCI-E 5x16 (~60GB/s) is close to dual channel DDR5 bandwidth, this is the cheapest method to run huge models. Code: https://github.com/BlinkDL/fast.c

fspeech 10 months ago

Do you have any benchmark run yet? I am interested in knowing how many tokens/sec you can get to. Though in the end it should be more efficient to run the model on distributed server clusters.
fspeech 10 months ago

Deepseek's open source inference code, while correct, may not be fully efficient. For example the MLA needs the right associative matrix multiplication order to be efficient.

Settings

Run Deepseek from fast NVMe drives

Keyboard Shortcuts