Webmaster
How Yandex indexes sites
A site on search results page

Site map

The links using which people can go from one page to another are the underlying base of the Internet. The work of Yandex search robot is based upon following links and analyzing these links. Almost all documents known to Yandex got into the search database as a result of the robot following the links, and only some of them were added manually by webmasters using the “New site notification” form. If your site contains documents that other pages do not link to, Yandex robot will never know that they exist, and they will not be included in the search. This is why it is important that your site pages be linked to each other. Here are several recommendations on structuring your site:

  1. Maintain clear structure of links on the web site. Each document must pertain to a specific section. Ensure that the visitors can access each document using a simple link defined by <A> HTML tag: <a href=...>...</a>. In general, the time it takes Yandex robot to index an internal site page depends, among other things, on its nesting level. The deeper the page is located, the longer time may elapse before it will be included into the index.

    When creating links from one document to another on your site, you should take one more thing into account. The main page is often your landing page. It is much easier for people to memorize your site name (domain name) than a particular internal page whose URL may be quite complicated. Your site navigation must enable the user to understand its structure quickly, and find the required documents easily, to avoid the situation when the user, having failed to find what he or she needed, leaves the site disappointed.

  2. Use site map; For large sites with many pages, we recommend creating a sitemap that can be downloaded in the appropriate section of Yandex.Webmaster service, or include a link to it in robots.txt file. This will help the search robot to index and analyze documents published on your site.

  3. Restrict indexing of confidential information. Multiple duplicating pages, site search results, visit stats and other similar pages may place unnecessary load on the robot's resources and impede indexing of important content. Such pages have no value for a search engine because they do not provide any unique information for users. We recommend disallowing indexing of such pages using robots.txt file. If you do not exclude such pages from indexing, it may happen that auxiliary pages, being added or updated on a regular basis, will be indexed well, but the update of important information on the main pages will go unnoticed by the robot.

  4. Each page must have a unique address, or URL. It is desirable for the URL to give some idea of the page content. Using transliteration in page addresses also helps the robot to get an idea about the page content. For example, just a glance at the following URL, http://download.yandex.ru/company/experience/searchconf/Searchconf_Algoritm_MatrixNet_Gulin.pdf, provides the search robot with plenty information about the document: it is downloadable; most likely it is in PDF format; it is probably relevant to a query that contains “MatrixNet” keyword, etc.

  5. Make the links leading to other sections of the site text links to give the robot more information about the documents to which they lead.

  6. Check the symlinks for correctness, so following them within the site does not generate a URL that grows infinitely. The pages whose path contains multiple repetitions of the same token, e.g. site.com/page/page/page/page/, may be excluded from indexing.

Note. 

Using robots.txt file, prohibit indexing of the pages that are not intended for the users.

Rate this article
Thank you for your feedback!