DiffusionBlocks: Training Neural Networks One Block at a Time
pub.sakana.aiI do not understand.
how is this different from building smaller transformer layers, and each layer just denoises less?
I do not understand.
how is this different from building smaller transformer layers, and each layer just denoises less?