Long context GPT-OSS fine-tuning

4 points by danielhanchen 4 months ago · 1 comment

Reader

Hey HN! Just sharing some work we did to make gpt-oss finetuning use O(N) and not O(N^2) VRAM via Flex Attention + some bug fixes :)

Settings