Settings

Theme

Run Deepseek from fast NVMe drives

github.com

5 points by ironbound a year ago · 4 comments

Reader

ironboundOP a year ago

Testing extreme NVME offload (4 x Gen5x4) for DeepSeek R1Because PCI-E 5x16 (~60GB/s) is close to dual channel DDR5 bandwidth, this is the cheapest method to run huge models. Code: https://github.com/BlinkDL/fast.c

  • fspeech 10 months ago

    Do you have any benchmark run yet? I am interested in knowing how many tokens/sec you can get to. Though in the end it should be more efficient to run the model on distributed server clusters.

  • fspeech 10 months ago

    Deepseek's open source inference code, while correct, may not be fully efficient. For example the MLA needs the right associative matrix multiplication order to be efficient.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection