Experiment with One Million Album Covers

As might be expected, the Internet Archive has lots of data in its virtual stacks. Besides the books, movies and stored webpages, there are datasets provided from the Internet at large or from individual contributors.

But datasets are just big clumps of data unless someone does something with them. Obviously we’re keeping these around no matter what (our current goal is “forever”), but without folks tinkering, experimenting and using the data sets, they’re just piles clogging up hard drives.

So, in the name of experimentation, we’ve put together one million album cover images from a variety of sources, and put them into this item. The total size is 148 gigabytes (!) of .JPG, .GIF and .PNG images. (There is a torrent on the item, allowing you a more flexible way to download that amount of imagery.)

The albums are somewhat-arbitrarily split according to filename, with .TAR (tape archive) files for the letter a, b, c, etc. The goal here is experimentation – these have not been curated, overly quality checked, or any differently-sized doubles removed. If you’re writing programs or doing analysis, these are the sorts of oddness or strangeness you should be aware of.

(If you just want to play around a bit, there’s a link to a set of a mere 1200 album covers, for a total of 200 megabytes.)

We’ve included some suggestions for using the data, and some projects that might be interesting to get into, either as a hacking project or just because you’re learning computer science.

Let us know how it works for you!