Settings

Theme

New open-source datasets for music-based development

metabrainz.org

128 points by aerozol 3 years ago · 10 comments

Reader

aerozolOP 3 years ago

The MusicBrainz project by MetaBrainz has released their latest dataset, MusicBrainz Canonical Metadata. This dataset solves a number of problems involving matching music to the correct entry in the massive MusicBrainz database. Previously it has been difficult to programmatically identify the main (canonical) release of an album or song. This dataset solves the problem, for anyone interested in building their own music database, tagger application, or other music-related application.

You can find all the MetaBrainz datasets here: https://metabrainz.org/dataset…

The MusicBrainz database aims to collect all the metadata for all music that has ever been published. For popular albums and songs, which have been released many times, it can be hard to answer the question “which one is the main (canonical) entry?” Using the new dataset, a user can enter any release or recording MBID (MusicBrainz identifier), and match it to the canonical entry.

The tables included in the dataset contain all the string metadata necessary to make effective use of the dataset. Artist names, release names and recording names are all present, indexed against the MBID’s. This lowers the barrier for entry to music-based development considerably — anyone can now import the dataset into their favourite datastore, and start looking up tracks.

The MetaBrainz Foundation offers a number of different datasets, often under the Creative Commons Zero (CC0) licence. These datasets can be used to build applications, databases, or train machine learning algorithms/AI. MetaBrainz Foundation datasets power countless projects, and stand behind the scenes of many of today’s largest tech companies, such as Microsoft, Google, and Amazon. The MetaBrainz Foundation datasets are all available on the MetaBrainz datasets page. The MetaBrainz Foundation uses the new MusicBrainz canonical metadata dataset themselves, primarily in the tagging application MusicBrainz Picard, and the social music site ListenBrainz.

  • canadiantim 3 years ago

    Heads up that the MetaBrainz page you link is 404 page not found.

    This info is most awesome to know though, thank you

  • have_faith 3 years ago

    Is this an AI summary post?

    • aerozolOP 3 years ago

      Hey, I was away for the long weekend, so a bit late…

      To answer your question, I’m 99.9% sure I’m not AI, just a derp who pastes in truncated URLs.

andy_ppp 3 years ago

It’s such a great project in so many ways and unsurprisingly not a single industry system uses MBIDs to look up tracks, could be so useful if the industry just standardised and contributed back to this project.

wswope 3 years ago

I’ll bite: Anyone doing fun stuff on the personal-music-collection front with these datasets?

I’m a big fan of Picard (via beets), and the ListenBrainz dataset seems like a fun resource for finding new music… but despite being intrigued by the rest of the datasets, I have no clue what I’d ever do with them from a practical standpoint.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection