How does Yandex search work?

  1. Stage 1. Crawling the site
  2. Stage 2. Loading and processing (indexing) the data
  3. Stage 3. Creating a database of the pages that can be included in the search results
  4. Stage 4. Generating search results
  5. FAQ

To start displaying your site in search results, Yandex must find out about its existence using robots.

A robot is a system that crawls site pages and loads them into its database. Yandex has lots of robots. Saving pages to the database and their further processing using algorithms is called indexing. The loaded data is used to generate search results. They are regularly updated and may affect the site ranking.

There are several stages before a site appears in search results:

Stage 1. Crawling the site

Stage 2. Loading and processing (indexing) the data

Stage 3. Creating a database of the pages that can be included in the search results

Stage 4. Generating search results

Stage 1. Crawling the site

The robot determines which sites to crawl and how often, as well as how many pages to crawl on each of them.

When crawling them, the robot takes into account the list of already known pages, which is based on the following data:
Robots continually monitor the appearance of new links, content updates on previously downloaded pages, and page availability. They do this as long as:
  • The link is placed on your own or third-party site.
  • The page is not prohibited for indexing in the robots.txt file.

When the robot tries to load a site page, it receives a response from the server with the HTTP status code:

HTTP status code Note
200 OK The robot will crawl the page.
3XX The robot needs to crawl the page that is the redirect target. Learn more about handling redirects.
4XX and 5XX

A page with this code won't be included in the search. If it was before the robot crawled it, then it will be removed from the search.

To prevent the page from falling out of the search, configure the server so that it responds with the 429 code. The robot will access the page and check the response code. This can be useful if the site page looks incorrect due to problems with the CMS. After the error is fixed, change the server response.

Note. If the page responds with the 429 code for a long time, this indicates that the server experiences difficulties with the load. This can reduce the site crawl rate.
Useful tools

HTTP/2 version support

Stage 2. Loading and processing (indexing) the data

The robot determines the content of the page and saves it to its database. To do this, it analyzes the page content, for example:
  • The contents of the Description meta tag, the title element, and the Schema.org micro markup, which can be used to generate a page snippet.
  • The noindex directive in the robots meta tag. If it's found, the page won't be included in the search results.
  • The rel="canonical" attribute indicating the address that you consider a priority for displaying in the search results for a group of pages with the same content.
  • Text, images, and videos. If the robot determines that the content of several pages matches, it may treat them as duplicates.
Useful tools
  • Troubleshooting — Helps check the quality of a site and fix errors, if any.
  • Crawl statistics — Shows which pages the robot has crawled and how often it accesses the site.
  • How to reindex a site — Allows you to report a new page on the site or an update of a page already included in the search.

Stage 3. Creating a database of the pages that can be included in the search results

Based on the information collected by the robot, the algorithms determine the pages that can be included in the search results. The algorithms take into account a variety of ranking and indexing factors that are used to make the final decision. For example, the database won't include pages with indexing disabled or duplicate pages.

A page may contain the original, structured text but the algorithm won't add it to the database, as it's highly unlikely that the page gets into the range of view in the search results. For example, due to lack of demand from users or high competition in this topic.

Useful tools
  • Pages in search — Helps you track the status of site pages, for example, HTTP response status codes or duplicate pages.
  • Site security — Provides information about violations and infected files.

To find out if a site subdomain appears in the search results, subscribe to notifications.

FAQ

HTTP/2 version support

If you use HTTP/2, the Yandex robot indexes your site using the HTTP/1.1 protocol. However, there will be no conflicts with your server settings. The HTTP/2 version doesn't affect the speed of crawling and doesn't change the site's position in Yandex search results.

The page description in the snippet differs from the content in the description meta tag
The page description in the search results is based on the text that is most relevant to the search query. This can be the content of the Description meta tag or the text placed on the page. For more information, see Displaying the site title and description in search results.
Search results show links to internal site frames
Before loading the page in the browser console, check if the parent frame with navigation is open. If not, open it.
My server doesn't provide last-modified

Your website will still be indexed even if your server doesn't provide last-modified document dates. However, you should keep in mind the following:

  • The date won't be displayed in the search results next to your website pages.

  • The robot won't know if a website page has been updated since it was last indexed. Modified pages will be indexed less often because the number of pages that the robot gets from a website each time is limited.

How does encoding affect indexing?
The type of encoding used on the site doesn't affect the site indexing. If your server doesn't pass the encoding in the header, the Yandex robot will identify the encoding itself.
Can I manage reindexing frequency with the Revisit-After directive?
No. The Yandex robot ignores it.
Does Yandex index a site on a foreign domain?
Yes. Websites containing pages in Russian, Ukrainian, and Belarusian ​​are indexed automatically. Sites in English, German, and French are indexed if they might be interesting to users.
How does a large number of URL parameters and the URL length affect indexing?

A large number of parameters and nested directories in the URL, or overly long URLs, may interfere with the site indexing.

The URL can be up to 1024 characters.

Does the robot index GZIP archives?
Yes, the robot indexes archives in GZIP format (GNU ZIP compression).
Does the robot index anchor URLs (#)?

The Yandex robot doesn't index anchor URLs of pages, except for AJAX pages (with the #! character). For example, the http://example.com/page/#title page won't get into the robot database. It will index the http://example.com/page/ page (URL before the # character).

How does the robot index paginated pages?
The robot ignores the rel attribute with the prev and next values. This means that pagination pages can be indexed and included in search without any restrictions.

If pages don't appear in the search results for a long time or were excluded from them, provide examples of such pages in the form.