Show HN: The Population Project
thepopulationproject.orgTwo years ago, I turned 50. After a successful career as an entrepreneur, a business angel and a novelist, I set out to start a philanthropic venture under the following constraints:
- it had to be global. - it had to be beautiful (in my eyes, at least). - it had to be technology and stats driven.
I decided I would try to list the full name and date of birth of all humans alive. While some may find the concept pointless, I immediately knew I had struck gold:
- it was global and incredibly hard. - it had an almost artistic quality to it, like an ever-changing installation. - as a libertarian, I resent that states conduct censuses and then sit on the data. - One billion people in the world aren't officially registered. At least someone would acknowledge their existence.
I created a non-profit called The Population Project. I would never make a dime off it, but at least my costs would be tax-deductible.
I then started researching lists of names online. I quickly adopted two principles. First I would collect a minimal set of information : full name, birth date, and birth place. Second, I would only scrape public information, i.e. nothing behind a password.
After a few months, I realized I needed help from more experienced developers. I chose to work on 4D, a platform I had used in the past to develop my company's information system. It was a tough choice: 4D is not a leading player in the back-end world, but I figured the growth of API tooling would make language choice less critical.
The first iteration of our database was frustrating - way too slow to publish a website. I learned the power of incremental change, with each marginal improvement saving you a few percent of speed or space. I also got to implement concepts I had heard about but never implemented, such as mirroring, partitioning, or hash-indexing.
Then I hired a team of six data processors in Madagascar who clean up and process the lists found online. Lots of Python and Excel macros in their day-to-day. I have instilled in them an obsession with quality. A bad record will sit in our base forever. After trying dozens of softwares, we've settled on Adobe Acrobat and Octoparse.
The final piece was the website. I lucked out in finding a strong team in Romania. They build with Next.js and deploy on Vercel. I gave them Wikipedia as the model to aim for. We/they haven't been able to match Wikipedia's simplicity. Our pages are too heavy. But I find the site user-friendly, pleasing to the eye and reasonably fast. We can and we will do better.
A word about privacy. Some people complain that because it publishes names and DOBs, the Population Project infringes on their privacy. We obviously don't see it that way.
- All our info is public. That DOB you find on the site is probably in the voter list of your state, a list that anyone can request or plainfully download. - The info we publish is minimal. Basically, we say that you exist. No one will find anything about your race, religion, sexual preferences, job or income. - We have adopted Wikipedia's privacy policy. We do not record your IP, unless you create or edit a record. - We're using Matomo for our Analytics. Great stuff. It's not free but they do not use your data like GA.
Why am I telling you all this? From the beginning, I've envisioned a three-step process:
1) Build the database and populate it with millions of Western profiles. 2) Launch the site, where anybody can create or edit records and share them with their family. 3) When we've reached critical mass (1B records?), start making deals with NGOs and governments, and venture into other alphabets.
We have just completed step 1. Step 2 is daunting as hell. I have grown a business but I have never grown a website. While I am ready to spend a bit of money on PR or SEO, I am not delusional: to reach the level of success we have in mind, we need this thing to go (somewhat) viral.
How do you do that? This is 100% an ego project--count the number of "I" statements in your declaration above. Even in your stated reasons why you "struck gold" with this idea, the first three are solely about your effort and taste and resentment. The fourth is about your site "acknowledging the existence" of one billion unregistered people in the world. I'm sure their actual existence is already being acknowledged by others in their own community; they don't need some tech chucklehead adding a row into a database for that. It's laughably egotistical to think it matters one whit to someone whether your site has counted them for bragging rights. I see absolutely no value in the result for humanity and especially not for the downtrodden billion. You would have done more philanthropy had you instead donated the funds to literally any other non-profit organization. I'm surprised you were able to claim non-profit status for this venture. It's obvious you put a lot of work into this, and you're clearly capable of creating and executing a project. It even looks nice. So hopefully this can be a point of reflection as you decide what to do next with your remaining time on this planet. You miss the point of free markets. This is exactly the questionably useful ego driven nonsense that is most beneficial for self made rich people to engage in for two reasons: 1. We may be too dumb to see the value in this, in which case we win big time when something that appears stupid ends up useful. 2. Every dollar spent on this is a dollar unavailable for their trust fund kids and/or grandkids to do even more likely useless and/or harmful things with. Free markets create huge value for people who create goods and services that make others lives better to the point where it creates the secondary problem of trust fund kids who did not create anything having excess power from excess inherited wealth and doing excess harm with it due to incompetence. This is the least destructive way to dissipate that effect. (We cant just tax it away, people will hide it and also be disincentivized to create value because they risk losing control of it at death). Note: i am not being a dick here on purpose. I am dealing with this exact problem myself and so have thought about it a lot. > After a successful career as an entrepreneur, a business angel and a novelist Of course it's a personal project, why undermine it, let the person do it's thing. > start a philanthropic venture
> created a non-profit Sure, people can do whatever they want, and this project seems mostly harmless (I can also see forced data collection/publication on individuals having negative consequences for some people). I collect crossword data myself. But I don't go around patting myself on the back for my philanthropic efforts and non-profit status. I mean come on. I don't think I was patting myself on the back but you're definitely spanking. No need to be venomous. No, it's not an ego project. I'm not saying it will lift the Third World out of poverty but it's a genuinely disinterested endeavor. And I give to other charities, some of which you would probably approve of. If your site becomes a spectacle/popular/viral, which I don't think it will, people will submit false data. I don't see how you can possibly deal with that. That's what they said about wikipedia and here it stands 20 years later, stronger than ever. Trolls will always exist but there are ways to keep them at bay, we're working on it. True, only a single determined person with a botnet it more than enough to wreck it completely. Our metrics are low enough at the moment that we would notice anyone trying to add thousands or even millions or records. But I hear you. At some point you might want to consider "pivoting" to a gigantic database of all dead humans: there would be fewer data protection and privacy issues! I have sometimes wanted to trace people's ancestry, or, more often, the descendents (of the parents) of a person who died half a century ago. It's depressing how difficult it is to look these things up. Various companies try to sell access to public records, but I don't do this often enough to be interested in paying for a subscription. With something like this you should really also publish exactly where the information came from. There's a big difference between "an anonymous contributor supplied this" and "this comes from a database that we downloaded from whatever.gov.uk on this date and here's a copy of that database in case you want to check". Some things that almost everyone is already aware of but I'll mention them anyway: * The concepts of "first name" and "last name" only apply to some cultures. * Most people have more than one name: women who change their name when they get married, middle names that may or may not get mentioned, names that are frequently abbreviated ("Kate" might be "Kate" or "Catherine" or ...), punctuation and diacritics that may be modified or omitted, Macdonald/McDonald/Mac Donald/..., various ways of transcribing the same name from a different alphabet, ... Thank you for your comments. We're well aware of the pitfalls you're pointing. Some of them can be avoided now, others will need a dose of AI down the road. For now, we log the data when we find it.
Mentioning the sources is a tricky issue. Our philosophy is to say as little or as much about everyone. Barack Obama's record is no more developed than yours. Linking a record to Wikipedia or to the list of Minnesota's sex offenders would break that rule, and not in a good way in my opinion. I didn't immediately understand the point you're making there because I don't think you'd ever need to use Wikipedia or a list of sex offenders as a source, but I think I see your point now: if for 98% of people the specified source is a government register of births then anyone who doesn't have that source mentioned will stick out and an astute reader will immediately infer that they were born in a place where the register of births is not easily accessible or they have changed their name or something like that. So mentioning the sources is, as you say, a tricky issue. > We have just completed step 1. Step 2 is daunting as hell. I have grown a business but I have never grown a website. I've seen incredibly successful 'website' backends that were nothing more than a Google Sheet behind a few white-label HTML forms that feed into the spreadsheet; the backend almost never makes the product 'viral' the 'magic' comes from successfully answering the question that's on everyone's mind: "why should I give you the one precious resource I have (time), and how/why will you make the rest of the limited time I have (before I die) any better?" or, "what is the ROI on my invested time?" for example, Sam Altman has been giving away 50 bucks as the answer, and some people are lining up as an exercise, pretend you are a stranger hearing about your own product for the first time (or better, pay someone $100 to pitch your own idea to you), and ask yourself the big question: "why should _I_ (the consumer, not the creator) care about spending time here?" also, check with a few PR companies about the name "Population Project"... Thank your for taking the time (!) to articulate your thoughts.
We think there will be many angles to look up our site:
- Do other people share my name, n my country and in the word?
- How rare is my first or last name?
- How old is that person I met? What's their birthday?
- Percentage of a population accounted for as proxy of a country's digital footprint.
You will find these answers super quickly, without a great investment.
That being said, we're also looking for volunteers: people who find lists or create dozens of records. As I'm sure you know, Wikipedia runs on 300,000 volunteers - that's 1 volunteer for 10,000 visitors!
Wikipedia volunteers do it for the love of knowledge. I think we can find similar-minded people, probably among those currently active in genealogy.
One last note about our name, we're a proper 501(c)(3) and our name is trademarked so I see little risk there.
Thank you again for your constructive remarks. i get those angles; growing up in a country where no one can even pronounce my name, it felt healthy to know that there was a world full of jareks haha it's not about trademark risk - you should sit down with someone and say the name "The Population Project" out loud, without saying anything else about the project, and ask them what they _think_ it means Shakespeare was being ironic: _everything_ is in a name Got it. Do you think The Population Project is a bad name? I usually get pretty good feedback about it. i think to the uninitiated it might sound like you're proposing a plan to do something about the population, which has never been well-received :) once you're married to a name, it's biologically impossible to see it the same way from the inside, that's why you need to look into an outsider's eyes IRL while you say the name try something fun, like "The Name Game" haha Have you considered restrictions on personal data collection due to regulations such as the GDPR? I'm curious how you are dealing with those, especially having to track all countries of the world. Sorry if this is mentioned on the website, I've had a brief look and I might have missed it Thank you for your remark. We try to address the issue in the Methodology section but it's a complex question. Roughly speaking, GDPR was passed to prevent merchants from contacting potential consumers without their consent. Well, we don't have anything to sell, we don't collect phone numbers or email addresses and we will never contact anybody. Found it, thanks for the directions. > Ours is that we perform a task in the public interest. You really really want to change this. It will never hold. You need a legal basis for this condition (lett. e of art. 6).
Not legal advice: a legitimate interest (lett. f, art. 6) looks more defendable > We will erase any individual’s personal data at their request within 30 days, according to GDPR’s article 17. 30 days is a very long time Also: I'm getting an internal server error, data collected: 0