torch.randperm isn't fully random and it can bias your trillion token-training run 😱 Today we're releasing ModernBERT, a new SOTA encoder-only model series. In this thread however, I'll share how torch.randperm (temporarily) put a wrench in the works (1/10)

