Massive Yandex code leak reveals Russian search engine’s ranking factors

2 min read Original article ↗

Yandex, the fourth-ranked search engine by volume, purportedly employs several ex-Google employees. Yandex tracks many of Google’s ranking factors, identifiable in its code, and competes heavily with Google. Google’s Russian division recently filed for bankruptcy after losing its bank accounts and payment services. Buraks notes that the first factor in Yandex’s list of ranking factors is “PAGE_RANK,” which is seemingly tied to the foundational algorithm created by Google’s co-founders.

As detailed by Buraks (in two threads), Yandex’s engine favors pages that:

  • Aren’t too old
  • Have a lot of organic traffic (unique visitors) and less search-driven traffic
  • Have fewer numbers and slashes in their URL
  • Have optimized code rather than “hard pessimization,” with a “PR=0”
  • Are hosted on reliable servers
  • Happen to be Wikipedia pages or are linked from Wikipedia
  • Are hosted or linked from higher-level pages on a domain
  • Have keywords in their URL (up to three)

You can search and click through all the factors on Rob Ousbey’s compiled search tool. You might notice that nearly 1,000 of the ranking factors have the tag “TG_DEPRECATED,” and more than 200 are listed as “TG_UNUSED.” Because the code is from February 2022 and was grabbed in July 2022, Yandex’s search has certainly changed since. But the leak provides a rare look into how search rankings are put together at a site that services one of the world’s largest countries.

Yandex previously saw its search engine code walk out the door in 2015, when a former employee tried to sell it on the black market for $28,000 to fund his own startup. The surprisingly low figure for the core code of Yandex’s main product suggested he was unaware of its real value. That employee was sentenced to a suspended two years in prison, and the code was never seen publicly.