EfficientSAM
yformer.github.ioExcited to play with this more! Forked the repo and added the models into the repo itself (migrated from Dropbox): https://github.com/xetdata/EfficientSAM
So if I'm understanding this correctly:
The SAM paper from this past April (that let you do zero-shot segmentation on any image, seemingly better than even OpenAI's CLIP) was using a ~600M parameter ViT model to generate image embeddings. And in order to make it less computationally expensive to generate those same embeddings, they replace that model with a smaller ViT encoder that was pre-trained using the masked auto-encoder back propagation method?
https://github.com/ChaoningZhang/MobileSAM was the previous attempt at reducing the size of the large image encoder used by SAM.
it's called efficient Sam and it appears to be onpar or better than fastsam but did I miss a memory or speed comparison?
The comparison is figure 1 of the paper. I think the bubble size represents number of parameters, which likely roughly corresponds to memory consumption.
can’t wait for everywhere all at once function.
Is what?