Show HN: "Be horse." – a diffusion language model on an M2 Air
boesch.devBe horse. Love this kind of experiment. Would the model perform better with word tokens? A friend of mine forked the repo and tried it with BPE (Byte-Pair Encodings), and it did noticeably improve performance.