Image Self Supervised Learning on a Shoestring
theadamcolton.github.ioI'm trying to train a variable resolution ViT using IJEPA. I'm currently topping out at about 30% on imagenet1k after training for 20 epochs (6 hours)
It'd be cool to have some help and feedback. I'm on the right track to getting really killer setup that is super fast to train it needs more evaluations and more tuning. Anyone interested?