Show HN: Community-driven, blind comparison of search engine results

1 points by maltsev 11 days ago · 0 comments · 2 min read

Reader

Hi HN,

I built Search Bench as a small experiment to compare search engines without showing which engine produced which results. It was inspired by the idea behind the LLM Arena, but applied to search.

How it works:

1. You enter a query.

2. You see two result sets side-by-side (search engine names hidden).

3. You pick which is better, or mark them as similar.

Methodology

- Each vote is a pairwise comparison (ties count as 0.5 win each).

- Ratings use a Bradley–Terry model with iteratively updated ability scores, normalized by geometric mean.

- Final scores are log-scaled (1500 + 400 * log10(ability)), like ELO but derived from the Bradley–Terry model.

- Pair selection is adaptive, prioritizing under-sampled search engines and close matchups via an uncertainty × closeness weighting.

This definitely isn't an objective ranking: queries and voters are self-selected, results vary by context, and what counts as “better” depends on the person. Right now, the dataset is small (≈200 comparisons, mostly from me), so I'm especially interested in seeing:

- Whether results change with more independent voters.

- Whether there's a real quality signal at scale, or if most differences disappear once brand bias is removed.

If you have a minute, comparing a few queries yourself would be very helpful! I'd also appreciate critique, especially around statistical validity, bias sources, aggregation methods, or ways this could be gamed or misinterpreted.

No comments yet.

Settings

Show HN: Community-driven, blind comparison of search engine results

Keyboard Shortcuts