Settings

Theme

Karpathy's llama2.c ported to pure Python

github.com

6 points by atairov 2 years ago · 10 comments

Reader

andy99 2 years ago

I made a jupyter notebook "llama2.ipynb" from the Karpathy project: https://github.com/rbitr/llama2.ipynb

I didn't do a pure python, mine uses numpy, and although I haven't benchmarked, it runs the stories15M model much faster than 1.3 tok/sec on my 2018 macbook. You should try swapping in numpy matrix multiplication, or @ (I actually don't know if that's native or part of another package) for matmul and see what changes.

  • atairovOP 2 years ago

    1.3 tok / sec is something similar to my Python version port performance, but I tried on M1 Max

Bostonian 2 years ago

The llama2.py code defines its own accum, rmsnorm and matmul. Why not use NumPy? A "pure Python" code that is much slower than one using NumPy is less interesting to me.

  • atairovOP 2 years ago

    If your goal is to make it as fast as possible, then for sure Python implementation is not a solution here. I think for this exactly reason llama.cpp got high attention

behnamoh 2 years ago

I find these efforts impressive, but what is the value proposition here? (I'm not just talking about this fork, but also Karapathy's llama2.c as well).

  • atairovOP 2 years ago

    Personally for me the value was to implement a complex logic from a scientific paper in a pure Python. It helps to understand the essence of a cutting edge AI technology. And it's quite fascinating that it would take about 500 lines of core part code to implement inference for such a complex solution.

  • atairovOP 2 years ago

    Regarding the original llama2.c as I believe the value proposition is to have simple implementation that can execute the inference locally on wide variety of platforms. What if we can execute fine-tuned Llama7B on our phones?

    • brucethemoose2 2 years ago

      > What if we can execute fine-tuned Llama7B on our phones?

      7B and 13B are already quite performant with mlc-llm (which uses an Apache TVM Vulkan/Metal backend). Llama.cpp has the potential to perform well too.

      These "single file" implementations are not meant to be optimized or feature rich, I dont think.

  • brucethemoose2 2 years ago

    Its educational. It shows a how llama works in a clear, concise, testable way.

  • westurner 2 years ago

    Writing one's own and/or porting every line of code has great value

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection