How do I prevent from indexing the pages that aren't needed in search results?

Sites typically have pages that aren't intended for visitors. This includes:

Service pages containing technical information or intended for managing the site.
Pages with content that is only useful to authorized users, such as online stores' checkout pages or user profiles.
Drafts, placeholder pages, and other pages not ready for publication.

Why you should hide these pages from indexing

The number of such pages is comparable to, and may even exceed, the number of pages the site owner might want to attract visitors to. As a result:

Crawling these pages with indexing bots would increase unnecessary load on the site.
It would take the indexing bot longer to crawl your landing pages.
If such pages are included in the search index, they may compete with landing pages and other important pages and confuse users.

How to check in Yandex Webmaster which unwanted pages are crawled

In Yandex Webmaster, you can check which of your site pages the Yandex indexing bot crawled without adding to the index.

Go to Yandex Webmaster, select Indexing → Searchable pages, and open the Excluded pages tab.
Click and select Low-value or low-demand page.

The list that appears may include pages you want to attract users to. You can improve such pages to make them more likely to be included in search results.

The list may also contain pages that were crawled by the indexing bot despite not being intended for users. We recommend hiding them from the bot.

Learn more:

Low-value or low-demand pages

How to hide pages from the indexing bot

Server response with a 4xx HTTP status code

For pages that require authorization and contain personal information, such as a delivery address, phone number, or payment details, set the "403 Forbidden" server response HTTP code. For pages that were deleted, use "404 Not Found" or "410 Gone".

Learn more:

HTTP status codes reference

"Noindex" directive for the "robots" meta tag or "X-Robots-Tag" HTTP header

The noindex directive prohibits indexing the page text. The page won't be included in the search results.

To use this directive:

Configure the X-Robots-Tag HTTP header for a specific URL on your site's server.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
X-Robots-Tag: noindex

Alternatively, place the robots meta tag with the noindex directive inside the head element of the page's HTML code.

<html>
    <head>
        <meta name="robots" content="noindex" />
    </head>
    <body>...</body>
</html>

Learn more:

Robots meta tag and X-Robots-Tag HTTP header

"Disallow" directive in the robots.txt file

The Disallow directive prohibits crawling certain sections or pages of a site. You can use it to hide from indexing bots the pages intended for site management, containing private information, or displaying site search results.

For example:

User-agent: Yandex
Disallow: /admin # Prohibit crawling for pages and sections with URLs starting with /admin
Disallow: /order.html # Prohibit crawling for the checkout page
Disallow: *?s= # Prohibit crawling for site search results and pages with URLs containing ?s=

Learn more about the robots.txt file and Disallow directive.

How to check in Yandex Webmaster if a page is blocked from indexing

To verify that the pages listed in robots.txt are correctly blocked from indexing, go to Tools → Robots.txt analysis, specify the page URLs in the Check if URLs are allowed field, and run the check. If everything is correct, the check results will indicate that the URLs are blocked by the Disallow directive.

To find out if any available method properly blocks a page from indexing bots, use the Indexing → Page status check tool. Run the page URL check. The page's status in search is displayed in the Page version in the database → Status in search tab. If the page is hidden from indexing bots, it will have the status "The page is unknown to the robot".

Learn more:

How to exclude unwanted pages from search

To remove pages from search, you can use:

The Disallow directive in the robots.txt file.
The HTTP status code 404, 403, or 410.
the robots meta tag with the noindex directive.

The page will be removed from the search engine database within a week after the indexing bot discovers your instructions.

With Yandex Webmaster, you can initiate the removal process without waiting for the scheduled indexing bot crawl. To do this, go to Tools → Remove pages from search results and specify the URL of a page, or a prefix if you want to exclude a group of pages.

Learn more:

Removing certain pages from the search

You can also go to

Was the article helpful?

How do my site pages appear to indexing bots and users?

What's wrong with duplicate pages?