Sites typically have pages that aren't intended for visitors. This includes:
Service pages containing technical information or intended for managing the site.
Pages with content that is only useful to authorized users, such as online stores' checkout pages or user profiles.
Drafts, placeholder pages, and other pages not ready for publication.
Why you should hide these pages from indexing
The number of such pages is comparable to, and may even exceed, the number of pages the site owner might want to attract visitors to. As a result:
Crawling these pages with indexing bots would increase unnecessary load on the site.
It would take the indexing bot longer to crawl your landing pages.
If such pages are included in the search index, they may compete with landing pages and other important pages and confuse users.
How to check in Yandex Webmaster which unwanted pages are crawled
In Yandex Webmaster, you can check which of your site pages the Yandex indexing bot crawled without adding to the index.
Go to Yandex Webmaster, select Indexing → Searchable pages, and open the Excluded pages tab.
Click and select Low-value or low-demand page.
The list that appears may include pages you want to attract users to. You can improve such pages to make them more likely to be included in search results.
The list may also contain pages that were crawled by the indexing bot despite not being intended for users. We recommend hiding them from the bot.
For pages that require authorization and contain personal information, such as a delivery address, phone number, or payment details, set the "403 Forbidden" server response HTTP code. For pages that were deleted, use "404 Not Found" or "410 Gone".
The Disallow directive prohibits crawling certain sections or pages of a site. You can use it to hide from indexing bots the pages intended for site management, containing private information, or displaying site search results.
For example:
User-agent: Yandex
Disallow: /admin # Prohibit crawling for pages and sections with URLs starting with /admin
Disallow: /order.html # Prohibit crawling for the checkout page
Disallow: *?s= # Prohibit crawling for site search results and pages with URLs containing ?s=
Learn more about the robots.txt file and Disallow directive.
How to check in Yandex Webmaster if a page is blocked from indexing
To verify that the pages listed in robots.txt are correctly blocked from indexing, go to Tools → Robots.txt analysis, specify the page URLs in the Check if URLs are allowed field, and run the check. If everything is correct, the check results will indicate that the URLs are blocked by the Disallow directive.
To find out if any available method properly blocks a page from indexing bots, use the Indexing → Page status check tool. Run the page URL check. The page's status in search is displayed in the Page version in the database → Status in search tab. If the page is hidden from indexing bots, it will have the status "The page is unknown to the robot".
The page will be removed from the search engine database within a week after the indexing bot discovers your instructions.
With Yandex Webmaster, you can initiate the removal process without waiting for the scheduled indexing bot crawl. To do this, go to Tools → Remove pages from search results and specify the URL of a page, or a prefix if you want to exclude a group of pages.