Show HN: I made a Pinterest clone using SigLIP image embeddings

98 points by verse 2 years ago · 21 comments · 1 min read

Reader

Click an image to get similar images.

I crawled Tumblr and used SigLIP to get vector embeddings for many images.

When you click an image, it finds the most similar vector embeddings in the database, and returns the corresponding images.

yorwba 2 years ago

Sometimes there are duplicate results, e.g. https://mood-amber.vercel.app/images/0b733fc2-7093-4443-8872... has two copies of https://mood-amber.vercel.app/images/f920a599-bbd7-4805-3317... right next to each other. (The link UUID is the same, so I assume this is an issue with the search algorithm, not simply duplicate data that got scraped.)

lulzx 2 years ago

wucaworld 2 years ago

Very cool! How did you get the collage layout? I noticed images in each column don’t have the same size. I assume images get Centre cropped?

omeze 2 years ago

Cool! I haven’t tried SigLIP out yet but it seems to be the new hotness over CLIP… I just dont have a good project idea yet

Tiberium 2 years ago

Is there a repo, especially for training? I'd like to see how SigLIP performs on a dataset of only anime images.

jarebear6expepj 2 years ago

The the vision training models are available here: https://github.com/google-research/big_vision/tree/main which I am assuming, based on the research paper is what was used for the project.

gammalost 2 years ago

There are some interesting images there. Why are you not including the source of the images?

GamerAlias 2 years ago

Good stuff! Do you have any intuitive sense of whether SigLIP is particularly stronger than CLIP here? Also vector DB over Faiss index?

verseOP 2 years ago

I haven't done much testing or anything, but it seems to me that siglip "understands" what it's looking at more than CLIP
also no, I just put everything on Supabase and added pgvector. super easy:
https://supabase.com/docs/guides/database/extensions/pgvecto...
ReD_CoDE 2 years ago

qdrant doesn't support vector DB over Faiss index?
Also, pgvector or qdrant? which is better?

squam 2 years ago

Cool project! Thanks for sharing

Yenrabbit 2 years ago

Neat! How many images are in the dataset out of curiosity?

convolvatron 2 years ago

quite a bit, but surprisingly not

ijhuygft776 2 years ago

nice, we always need more clones and improvements.... hope you get traction.

I never click Pinterest links because the experience is too bad.

karolist 2 years ago

I use unpinterested extension in Chrome to remove pinterest from search results, I was annoyed so much at some point. Maybe they're SEO spam is more under control now, not sure.

Settings