Directly run and investigate Llama models locally with only PyTorch
github.comThere are other popular ways to invoke these models, such as Ollama and Hugging-Face's general API package: transformers, but those hide the interesting details behind an API. Peel back the layers to poke, prod and understand!