Settings

Theme

Ask HN: Storing Images in PostgreSQL vs. Object Storage for Large Datasets

16 points by atif089 2 years ago · 3 comments · 1 min read


Is it advisable to store images directly in PostgreSQL for a dataset of 100 million records, each with a 200KB image, or should I use object storage with references from the start? My primary and only use case involves creating multimodal embeddings for search and relevance purposes.

throwaway38375 2 years ago

If you are storing 100 million images at 200KB each, that comes out at 20TB!

I would calculate the costs of something like S3 versus buying five 4TB HDDs and running a network file server.

You're going to save a ton of money hosting this yourself. I would go with two used powerful desktop PCs. One as a DB server and the other as the file server.

Store the images on the file server and store the image's path in the database server.

speedgoose 2 years ago

You could do quick tests using bytea, toast, or large objects.

But an object store may be more convenient overall.

When I did something similar, I did store embeddings and the image UUID in a table and my images in an object store with the same UUIDs as filenames. It was simpler to upload the images and put them available through a CDN.

reactor 2 years ago

Use something like SeaweedFS or Minio etc.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection