How do I prevent from indexing the pages that aren't needed in search results?

Sites typically have pages that aren't intended for visitors. This includes:

  • Service pages containing technical information or intended for managing the site.
  • Pages with content that is only useful to authorized users, such as online stores' checkout pages or user profiles.
  • Drafts, placeholder pages, and other pages not ready for publication.

Why you should hide these pages from indexing

The number of such pages is comparable to, and may even exceed, the number of pages the site owner might want to attract visitors to. As a result:

  • Crawling these pages with indexing bots would increase unnecessary load on the site.
  • It would take the indexing bot longer to crawl your landing pages.
  • If such pages are included in the search index, they may compete with landing pages and other important pages and confuse users.

How to check in Yandex Webmaster which unwanted pages are crawled

In Yandex Webmaster, you can check which of your site pages the Yandex indexing bot crawled without adding to the index.

  1. Go to Yandex Webmaster, select IndexingSearchable pages, and open the Excluded pages tab.
  2. Click and select Low-value or low-demand page.

The list that appears may include pages you want to attract users to. You can improve such pages to make them more likely to be included in search results.

The list may also contain pages that were crawled by the indexing bot despite not being intended for users. We recommend hiding them from the bot.

Learn more:


How to hide pages from the indexing bot

Server response with a 4xx HTTP status code

For pages that require authorization and contain personal information, such as a delivery address, phone number, or payment details, set the "403 Forbidden" server response HTTP code. For pages that were deleted, use "404 Not Found" or "410 Gone".

Learn more:

"Noindex" directive for the "robots" meta tag or "X-Robots-Tag" HTTP header

The noindex directive prohibits indexing the page text. The page won't be included in the search results.

To use this directive:

  • Configure the X-Robots-Tag HTTP header for a specific URL on your site's server.
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
X-Robots-Tag: noindex
  • Alternatively, place the robots meta tag with the noindex directive inside the head element of the page's HTML code.
<html>
    <head>
        <meta name="robots" content="noindex" />
    </head>
    <body>...</body>
</html>

Learn more:


"Disallow" directive in the robots.txt file

The Disallow directive prohibits crawling certain sections or pages of a site. You can use it to hide from indexing bots the pages intended for site management, containing private information, or displaying site search results.

For example:

User-agent: Yandex
Disallow: /admin # Prohibit crawling for pages and sections with URLs starting with /admin
Disallow: /order.html # Prohibit crawling for the checkout page
Disallow: *?s= # Prohibit crawling for site search results and pages with URLs containing ?s=

Learn more about the robots.txt file and Disallow directive.


How to check in Yandex Webmaster if a page is blocked from indexing

To verify that the pages listed in robots.txt are correctly blocked from indexing, go to ToolsRobots.txt analysis, specify the page URLs in the Check if URLs are allowed field, and run the check. If everything is correct, the check results will indicate that the URLs are blocked by the Disallow directive.

To find out if any available method properly blocks a page from indexing bots, use the IndexingPage status check tool. Run the page URL check. The page's status in search is displayed in the Page version in the databaseStatus in search tab. If the page is hidden from indexing bots, it will have the status "The page is unknown to the robot".

Learn more:


To remove pages from search, you can use:

  • The Disallow directive in the robots.txt file.
  • The HTTP status code 404, 403, or 410.
  • the robots meta tag with the noindex directive.

The page will be removed from the search engine database within a week after the indexing bot discovers your instructions.

With Yandex Webmaster, you can initiate the removal process without waiting for the scheduled indexing bot crawl. To do this, go to ToolsRemove pages from search results and specify the URL of a page, or a prefix if you want to exclude a group of pages.

Learn more: