Webmaster
How Yandex indexes sites
A site on search results page

Answers to questions regarding indexing

What is a page duplicate?

A page duplicate is when pages on a site have identical content but different URLs.

For example:

  • http://site.com and http://site.com/index.php/,
  • http://site.ru/page/ and http://site.ru/page.

If both pages are indexed by Yandex's robot, the indexing system combines them into a group of duplicates. Only one of the pages is listed in search results.

There are many reasons why duplicate pages appear:

  • natural reasons (for example, if a page with a product description on an online store is available in several different categories on the site);
  • issues with incorrect site structure.

To make sure your page is listed in search results, we recommend that you clearly indicate it for the Yandex robot. Here's how you can do that:

My site has moved (the URL changed). What should I do?

Let the robot know about your new site using this form: Report a New Site. If the pages on the old and new sites are exactly the same, do this so that when someone types in the old URL the server kicks out a 301 error ( “Moved Permanently” ) and the Location field points to the new site's URL. If the old site has been shut down, you can speed up the process of deleting it from the index using this form: Remove URL.

You overloaded my server. Please stop!

You can use the robots.txt file to modify how the robot works. Write in that the Yandex robot should not pay attention to scripts that are too hard on servers or use the Crawl-Delay Directive.

You tried to download sensitive information from our server. What should I do?

The robot takes links from other pages, which means that some other page has links to sensitive parts of your site. You can password protect them or tell the Yandex robot to ignore them in the robots.txt file. Either way, the robot will stop downloading sensitive information.

How do I protect myself from fake robots mimicking Yandex robots?

To protect yourself against fake robots, use a reverse DNS lookup filter. That method is preferred over managing access using IP addresses, seeing as how it is tied to changes made within Yandex's internal networks.

Is it a problem that my server isn't providing last-modified values? I tried to set it up, but I couldn't make it work.

Your site will still be indexed even if your server doesn't provide last-modified document dates. However, you should keep in mind the following:

  • search results will not list a date next to pages from your site;

  • most users will not see your site if they sort search results by date;

  • the robot cannot tell if a site page has been updated since it was last indexed. Modified pages will therefore be indexed less often, given that the number of pages the robot gets from a site each time it stops by is limited.

My server doesn't issue encoding, is that a problem? I tried to set it up, but I couldn't make it work.

The Yandex robot does a good job determining document encoding by itself, so it isn't a problem for site indexing if it's missing in server headers.

My site uses frames. Yandex displays links to internal site frames in its search results. What should we do if all our navigation is unavailable because it's in a different frame?

You can try using JavaScript to solve the problem. Be sure to see if the parent frame with the navigation is open before loading the page. If it isn't, open it.

There is too much traffic going back and forth between my web server and your robot. Are compressed page downloads supported?

Yes, they are. The Yandex search robot says this for every page request: “Accept-Encoding: gzip,deflate” . This means you can set up your web server to lessen the traffic between it and our robot. However, you should keep in mind that transferring compressed content boosts CPU demands on your server, which can cause problems if it is overloaded. In supporting gzip and deflate, the robot adheres to the rfc2616 standard, section 3.5.

Your robot is trying to use broken links to load pages from my site. Why?

The robot takes links from other pages, which means that one of them to your site is broken. You may have changed the structure of your site, leaving the link on other web pages outdated.

What does the robot do for pages with a redirect? What if I use a refresh directive?

When the Yandex robot receives an answer with information in the heading that a given URL is a redirect (3xx codes), it adds the redirect's target URL to its review list. If it is a constant redirect (301 code or the page contains a refresh directive), then the old URL is excluded from the review list.

My page is periodically left out of search results. What's the problem?

If the robot sometimes gets an error (for example, due to unstable hosting) when contacting a page, it deletes that page from the list until its next successful contact.

Can I manage reindexing frequency using a Revisit-After directive?

No. The Yandex robot will ignore it.

Which data transfer protocols are supported for indexing?

Right now, Yandex supports two protocols: HTTP and HTTPS.

How do I tell the robot that it should index pages with or without a forward slash at the end of the URL?

Pages with a “/” at the end of their URLs are different for our robot from those without it. If the pages contain identical content, set up a 301 redirect from one to the other (you can use the htaccess settings for this) or indicate the canonical URL.

Why is the robot contacting non-existent pages/subdomains on my site?

It probably found links to them somewhere and tried to index them. Non-existent subdomains and pages should be unavailable or return a 404 error code for the robot to only index the right pages of your site.

Rate this article
Thank you for your feedback!