Show HN: Predict country of origin from a name
letmeguesswhereyourefrom.comBehind the scenes here: http://nxn.se/post/127065307170/let-me-guess-where-youre-fro... :)
Thanks for the write up. We're taking a similar approach at my startup to classify business reviews. The breakthrough for us came when we split the reviews into sentences and did N-gram analysis at the sentence level. The challenge is that the most significant N-grams (e.g. N > 2) have such low frequency that there isn't much data to train on. Our current approach is to try to coax patterns out of the N-grams (e.g. "salesman was rude" and "manager was mean" become "[employee]=[negative]"). I do like the top 5 approach, and I think I'll see if I can work that into our approach.
Very accurate for the names I've tried...impressive. What is the source? Wikipedia?
Yeah, I looked for lists of people by country.