About time to get back to Dreamcast cluster project. When I left off, I had forked llama2.c to do inference in pipeline parallel w/accel matmul using SH4 vector ops. Here it is running on 2 DCs. With only 32MB RAM, very tiny model outputs gibberish. 8 DCs will do TinyStories.

