Settings

Theme

Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus

23 points by mikepapadim 2 days ago · 6 comments · 1 min read

Reader

wget https://github.com/beehive-lab/TornadoVM/releases/download/v... unzip tornadovm-2.1.0-opencl-linux-amd64.zip # Replace <path-to-sdk> manually with the absolute path of the extracted folder export TORNADO_SDK="<path-to-sdk>/tornadovm-2.1.0-opencl" export PATH=$TORNADO_SDK/bin:$PATH

tornado --devices tornado --version

# Navigate to the project directory cd GPULlama3.java

# Source the project-specific environment paths -> this will ensure the source set_paths

# Build the project using Maven (skip tests for faster build) # mvn clean package -DskipTests or just make make

# Run the model (make sure you have downloaded the model file first - see below) ./llama-tornado --gpu --verbose-init --opencl --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke"

mikepapadimOP 2 days ago

https://github.com/beehive-lab/GPULlama3.java

lostmsu 2 days ago

Does it support flash attention? Use tensor cores? Can I write custom kernels?

UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection