Long context GPT-OSS fine-tuning
unsloth.aiHey HN! Just sharing some work we did to make gpt-oss finetuning use O(N) and not O(N^2) VRAM via Flex Attention + some bug fixes :)
Hey HN! Just sharing some work we did to make gpt-oss finetuning use O(N) and not O(N^2) VRAM via Flex Attention + some bug fixes :)