Since the beginning of 2013, Yandex has been working on an algorithm that detects spammy links and distinguishes them from organic links. The goal of this algorithm is to help users differentiate between quality and “artificial” websites, and to find out which pages are most trustworthy.
The first stage was the creation of a list of sites with automatic spammy links and their subsequent filtering from search results. At present, almost 20 million sites have been classified in this list. More than half (56%) of these are in Russian; 24% — in English.
In 2014 the list was integrated into a wide-scale spam filtering project for search results.
The main purpose of the algorithm developed by Yandex is to identify and filter out SEO spam sites in search results.
Yandex uses an algorithm for detecting link spam to improve credibility of search results, which allows users to know which search engines are unreliable.
It uses a graph theory approach, in which it identifies the links from “spam” sites as being more parallel with the ones from the organic sites than with each other. In addition, it considers a link “from a reliable site” as having fewer such parallel links, as well as fewer links from entirely unknown sources (for example, those that have been linked only yesterday).
The algorithm was introduced in June 2013 and became available for the public in December 2013.
In 2014, Yandex removed more than 20 million spammy links from search results. In 2015, it removed another 20 million; in 2016 — 16 million. In 2017, the figure was about 17 million. The list is updated regularly by Yandex and is built on 10 billion pages that are analyzed every day in 2 billion searches.
Spam Word Combinations
Yandex’s algorithm identifies a link as “spam” when it appears in the search results when using various combinations of words.
These are words that can appear on any web page, but appear most often in spam sites.
In addition, Yandex uses another method to identify links that do not meet its ideals. This method is based on analyzing pages on the same domain as one with the spammy links and those that are linked from them.
This process checks all pages indexed via Yandex and analyzes their number of hyperlinks directed towards them from outside the domain (from other domains).
The Spam Domain List
Yandex does not link to any information about sites with suspicious links (from its internal databases) and does not disclose it to anyone.
According to Yandex, the real-time list of spammy domains is automatically updated every third day.
It is possible to use this information to block a domain entirely.
In addition, Yandex has an integrated system that blocks spammy sites in other search services, such as Google and Bing. It is also used by antivirus programs and security solutions.