ggml : x2 speed for WASM by optimizing SIMD by ngxson · Pull Request #11453 · ggml-org/llama.cpp

2 min read Original article ↗

As Q6_K wasn't benchmarked my guess is that it's only around 50% faster than scalar and it won't get the 2-3 times improvement seen with the other quants.

@netrunnereve Surprisingly, the difference is not that much among other quants. Here is a timing benchmark with master branch:

run 100 times, ta = q6_K, tb = q8_K
sum all elem = -6817.279785, time elapsed: 1232 ms
run 100 times, ta = q5_K, tb = q8_K
sum all elem = -6826.036133, time elapsed: 1997 ms
run 100 times, ta = q4_K, tb = q8_K
sum all elem = -7252.433594, time elapsed: 1263 ms
run 100 times, ta = q3_K, tb = q8_K
sum all elem = -6567.977051, time elapsed: 1750 ms
run 100 times, ta = q2_K, tb = q8_K
sum all elem = -6834.509766, time elapsed: 2126 ms

And with this PR:

run 100 times, ta = q6_K, tb = q8_K
sum all elem = -6817.279785, time elapsed: 677 ms
run 100 times, ta = q5_K, tb = q8_K
sum all elem = -6826.034180, time elapsed: 790 ms
run 100 times, ta = q4_K, tb = q8_K
sum all elem = -7252.431152, time elapsed: 670 ms
run 100 times, ta = q3_K, tb = q8_K
sum all elem = -6567.976074, time elapsed: 824 ms
run 100 times, ta = q2_K, tb = q8_K
sum all elem = -6834.509766, time elapsed: 772 ms

Btw, because I got too many failed attempts with q6_K so I just tell the LLM to produce a less optimal code but more precise (especially the unpack part)

If the LLM was able to read a set of requirements, use tools, build and benchmark on its own, and automatically debug and reiterate then a lot of us would probably lose our jobs. And from what I'm seeing a lot of that's going to happen sooner than we think 😢

I kinda disagree with the fact that a lot of us would probably lose our jobs. My POV is that if machine can help people to do repetitive tasks, then we can have more time to spend on planning and experimenting with new ideas. And not just LLM, we have already been doing this for decades: for example, thanks to compilers and interpreters, most of us now don't need to think about assembly code when writing a website.