Using robots.txt

Robots.txt is a text file that contains site indexing parameters for the search engine robots.

Yandex supports the Robots Exclusion Protocol with advanced features.

When crawling a site, the Yandex robot loads the robots.txt file. If the latest request to the file shows that a site page or section is prohibited, the robot won't index them.

The robot assumes that the site content is accessible if:
  • The file size exceeds 32 KB.

  • Robots.txt is missing or the file is non-TXT.

  • The file is unavailable: the HTTP status code returned in response to the robot's request is other than 200 OK. Check the server response

  1. Recommendations on the content of the file
  2. Using Cyrillic characters
  3. How do I create robots.txt?
  4. FAQ

Recommendations on the content of the file

Yandex supports the following directives:

Directive What it does
User-agent * Indicates the robot to which the rules listed in robots.txt apply.
Disallow Prohibits indexing site sections or individual pages.
Sitemap Specifies the path to the Sitemap file that is posted on the site.
Clean-param Indicates to the robot that the page URL contains parameters (like UTM tags) that should be ignored when indexing it.
Allow Allows indexing site sections or individual pages.
Crawl-delay

Specifies the minimum interval (in seconds) for the search robot to wait after loading one page, before starting to load another.

We recommend using the crawl speed setting in Yandex.Webmaster instead of the directive.

* Mandatory directive.

You'll most often need the Disallow, Sitemap, and Clean-param directives. For example:

User-agent: * # specify the robots that the directives are set for
Disallow: /bin/ # prohibits links from the Shopping Cart.
Disallow: /search/ #  prohibits page links of the search embedded on the site
Disallow: /admin/ # prohibits links from the admin panel
Sitemap: http://example.com/sitemap # specify the path to the site's sitemap file for the robot
Clean-param: ref /some_dir/get_book.pl

Robots from other search engines and services may interpret the directives in a different way.

Note. The robot takes into account the case of substrings (file name or path, robot name) and ignores the case in the names of directives.

Using Cyrillic characters

The use of the Cyrillic alphabet is not allowed in the robots.txt file and server HTTP headers.

For domain names, use Punycode. For page addresses, use the same encoding as that of the current site structure.

Example of the robots.txt file:

#Incorrect:
User-agent: Yandex
Disallow: /корзина
Sitemap: сайт.рф/sitemap.xml

#Correct:
User-agent: Yandex
Disallow: /%D0%BA%D0%BE%D1%80%D0%B7%D0%B8%D0%BD%D0%B0
Sitemap: http://xn--80aswg.xn--p1ai/sitemap.xml

How do I create robots.txt?

  1. Create a file named robots.txt in a text editor and fill it in.
  2. Check the file in Yandex.Webmaster.
  3. Place the file to your site's root directory.

FAQ

The site or individual pages are prohibited in robots.txt, but are still in the search

As a rule, after you set a ban on indexing using any of the available methods, pages are excluded from the search results within two weeks. You can speed up this process.

The “Server responds with redirect to /robots.txt request” error occurs on the “Site diagnostics” page in Yandex.Webmaster.

For the robots.txt file to be taken into account by the robot, it must be located in the root directory of the site and respond with HTTP 200 code. The indexing robot doesn't support the use of files hosted on other sites.

You can check the server's response and the accessibility of robots.txt to the robot using the Server response check tool.

If your robots.txt file redirects to another robots.txt file (for example, when moving a site), add the redirect target site to Yandex.Webmaster and verify the rights to manage this site.