Genshin Impact Fine-tuning CLIP for anime search
colab.research.google.comWonder how this stacks up with ResNet-50 family classifiers trained through TensorFloe or ML.Net
This is more related to search tasks, where we encode text and image pairs to use text to search image. ResNet can also be served a backbone for search tasks: content-based image search/reverse image search/search image with image. You need to remove the ResNet50 classification head.
On the other hand, Tensorflow or MLNet are machine learning frameworks, to achieve the task you can choose whatever you want to build the model components.
Sets of ResNets can do search indexing. Picture similarity can be inferred by vector cosine of the two sets of ResNets evaluations between two pictures.
[Run a numerical optimizer on weights for the different ResNets' outputs for best results ]