Karpathy's llama2.c ported to pure Python

6 points by atairov 2 years ago · 10 comments

Reader

andy99 2 years ago

I made a jupyter notebook "llama2.ipynb" from the Karpathy project: https://github.com/rbitr/llama2.ipynb

I didn't do a pure python, mine uses numpy, and although I haven't benchmarked, it runs the stories15M model much faster than 1.3 tok/sec on my 2018 macbook. You should try swapping in numpy matrix multiplication, or @ (I actually don't know if that's native or part of another package) for matmul and see what changes.

atairovOP 2 years ago

1.3 tok / sec is something similar to my Python version port performance, but I tried on M1 Max

Bostonian 2 years ago

The llama2.py code defines its own accum, rmsnorm and matmul. Why not use NumPy? A "pure Python" code that is much slower than one using NumPy is less interesting to me.

atairovOP 2 years ago

If your goal is to make it as fast as possible, then for sure Python implementation is not a solution here. I think for this exactly reason llama.cpp got high attention

behnamoh 2 years ago

I find these efforts impressive, but what is the value proposition here? (I'm not just talking about this fork, but also Karapathy's llama2.c as well).

atairovOP 2 years ago

Personally for me the value was to implement a complex logic from a scientific paper in a pure Python. It helps to understand the essence of a cutting edge AI technology. And it's quite fascinating that it would take about 500 lines of core part code to implement inference for such a complex solution.
atairovOP 2 years ago

Regarding the original llama2.c as I believe the value proposition is to have simple implementation that can execute the inference locally on wide variety of platforms. What if we can execute fine-tuned Llama7B on our phones?
- brucethemoose2 2 years ago
  
  > What if we can execute fine-tuned Llama7B on our phones?
  7B and 13B are already quite performant with mlc-llm (which uses an Apache TVM Vulkan/Metal backend). Llama.cpp has the potential to perform well too.
  These "single file" implementations are not meant to be optimized or feature rich, I dont think.
brucethemoose2 2 years ago

Its educational. It shows a how llama works in a clear, concise, testable way.
westurner 2 years ago

Writing one's own and/or porting every line of code has great value

Settings

Karpathy's llama2.c ported to pure Python

Keyboard Shortcuts