Settings

Theme

Ask HN: What funding models exist for a search engine?

3 points by PingCo 20 days ago · 4 comments · 1 min read


Let's say I wanted to build a web scraper + search engine product that did not feature ads at all. What other funding models exist? How viable are they?

In particular, what if you sold anonymized data about trends/insights into consumer behavior to other businesses? At scale, is it conceivable that such a strategy would be profitable?

Under this model, the search results would be incentivized to be as accurate as possible. There would be incentives against influencing rankings to maximize profits (which differs from the classic display ads and the Google Ad Words models) because you would want your understanding of consumer behavior to be as untainted as possible.

Thoughts?

n1xis10t 19 days ago

I think that the only other funding model (other than ads) that I’ve seen is a subscription, and I’ve only seen Kagi do that. It seems to work well for them though, last time I checked they had something like 50’000 subscribers.

I kind of like your data idea, but I’m not sure how much of a market there would be for anonymous data like that. How anonymous would it be, and what kind of data would be collected? You would probably want to be careful not to let it turn out like the AOL Query Log.

I have thought that it might be fun to try monetizing a website by serving the user a chunk of javascript that does a small amount of bitcoin mining.

The nice thing about charging for access is that you can start with a low bandwidth server, and then increase the price a little if there are too many users for the server to handle. I don’t know of a way to “increase the data collection” if you need to lower the user count. Of course the bad thing about charging money is that it is a barrier for people who might want to try it.

As a side note, I would recommend using old data from the Common Crawl as a large part of the index. No-one has more than a few billion pages from them indexed, and the data that I’ve experimented with from 2014 seemed high quality, with very little spam. A lot of the old links won’t exist anymore, and it would be nice to index that stuff. If you got all or most of their pages (they have around 80 billion unique ones I believe) it would be like a full text search engine for the WayBack Machine (which maybe almost existed at one point, see this article: https://archive.org/details/search-timeline).

I’m always excited to see people start up search engine projects, is there a way for me to follow your progress?

  • ccgreg 19 days ago

    > and the data that I’ve experimented with from 2014 seemed high quality

    That's because it's from the blekko search engine.

    • n1xis10t 18 days ago

      Sounds like blekko had a larger impact on the early urls than I thought. Out of curiosity, do you remember how large blekko’s index was at it’s peak?

      • ccgreg 18 days ago

        The largest index we had was 4 billion, which is tiny. Our crawl frontier was much larger.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection