Finding Incestuous Royals with Wikipedia
github.comThis is super awesome - it's amazing all the things we can do with our equivalent of the library of Alexandria. Don't see enough projects using Wikipedia as a base layer, hopefully this continues!
Would also love for this to be adapted to fandoms and other types of wikis - might not need much modification.
Thank you so much!!! It's a project I've been wanting to have for awhile, so I'm glad that other people like it.
Fanwikis are my next target - probably a bigger amount of interest, and I'm hopeful that there will be a similar amount of consistency. I'll also need to work out what the interesting relationships to trace would be for fandoms - or see if I can parameterize that, since I suspect it varies a lot.
I now understand a bit better why Wikipedia isn't that popular as a base layer - it's built by humans, for humans. That can make it a bit tricky to parse for computers. There's no consistency to naming, labeling, or even organisation. There are some cultural standards, but they're not binding and tend to vary between sections. For instance, children are labeled as 'children' for most people, but 'issue' for a lot of monarchs, and 'offspring' for mythological characters.
Yeah you're right, but it's still the biggest information resource we have. Some sort of effort towards making Wikipedia more machine readable - maybe through parsing or automated synonyms - would really go a long way there.
I never thought of doing this for a fandom style wikipedia. This would be fascinating for fictional works with huge lore and character sets like the Tolkein universe or the Star Wars universe etc.
That's my next addition planned! Good idea on Tolkein and Star Wars - I'd been thinking smaller scale (eg Harry Potter) but those are probably a lot more interesting, and also better test cases.
In lieu of scraping Wikpedia, could this project be sped up by downloading the instance of Wikipedia itself? It's not that jumbo of a file size.
Oh wow, hadn't thought of that. Could be very possible! I'd say the main drawback is that would probably require self-hosting it instead of making use of Binder to have it available for free.
But you're right, the speed is definitely the biggest problem. I'll definitely take that idea seriously if I do anything else with this.