Ask HN: What dataset do you think is useful to you but not readily available
We are a bootstrapped team trying to build tools for data extraction. We are currently focusing on tools for data that is semi-structured and thus can be extracted using non deep learning based software. So if you think there is some data that you need (and you are willing to pay for) but it is not readily available, we might be able to help you. We are looking for different types of datasets that are actually useful to people, so that we can work towards a tool that can be generally used for some sort of data extraction. If you think you have such a dataset in mind, do let us know. Also, if you could share a website where we could find the semi-structured version of this dataset that you need, it'd be really helpful. Customer counts for ISPs worldwide by ASN. There are approximations for some economies, filing for federal regulations and stock exchange notices. There are yearly numbers for broadband which are pretty hazy. I went to a meeting where china declared 160m extra online users had been found that year. A huge amount of internet modelling and sampling would improve at scale if we knew this. I've discussed this with researchers in the field. Akamai and Google and Facebook have private information which is their secret sauce. Can you share a link to one of these filings or notices you are refering to? Or it would be helpful if you share what regulations are typically enforced on ISPs (in your region) and if they publish customer count details to public domain. A clean dataset YouTube music videos associated to musicbrainz tags. Hi! I am part of the team trying to build these tools. Have you checked out https://last.fm. I think its possible to associate musicbrainz data with last.fm data. Can you share more details on how you plan to use this data? Sure. Couple of points: * Lastfm has an API but it's not great and an API is different to a dataset. I want to have the data available in my own DB, not have to make requests for whatever I might need. * You can't get youtube links from the lastfm API, you have to crawl the tracks pages. * The tags on lastfm are a folksonomy not a fixed taxonomy like musicbrainz I have used [music-map](https://www.music-map.com/) and last.fm together to make a [playlist generator](http://playlist.hallofbrightcarvings.com.au) but I'd like to be able to do the same kind of thing without having to crawl third party sites.
I'd also like to know that there was a music database of listenable music that was kept up to date.