A page duplicate is when pages on a site have identical content but different URLs.
If both pages are indexed by Yandex's robot, the indexing system combines them into a group of duplicates. Only one of the pages is listed in search results.
There are many reasons why duplicate pages appear:
To make sure your page is listed in search results, we recommend that you clearly indicate it for the Yandex robot. Here's how you can do that:
Let the robot know about your new site using this form: Report a New Site. If the pages on the old and new sites are exactly the same, do this so that when someone types in the old URL the server kicks out a 301 error ( “Moved Permanently” ) and the Location field points to the new site's URL. If the old site has been shut down, you can speed up the process of deleting it from the index using this form: Remove URL.
The robot takes links from other pages, which means that some other page has links to sensitive parts of your site. You can password protect them or tell the Yandex robot to ignore them in the robots.txt file. Either way, the robot will stop downloading sensitive information.
To protect yourself against fake robots, use a reverse DNS lookup filter. That method is preferred over managing access using IP addresses, seeing as how it is tied to changes made within Yandex's internal networks.
Your site will still be indexed even if your server doesn't provide last-modified document dates. However, you should keep in mind the following:
search results will not list a date next to pages from your site;
most users will not see your site if they sort search results by date;
the robot cannot tell if a site page has been updated since it was last indexed. Modified pages will therefore be indexed less often, given that the number of pages the robot gets from a site each time it stops by is limited.
The Yandex robot does a good job determining document encoding by itself, so it isn't a problem for site indexing if it's missing in server headers.
Yes, they are. The Yandex search robot says this for every page request: “Accept-Encoding: gzip,deflate” . This means you can set up your web server to lessen the traffic between it and our robot. However, you should keep in mind that transferring compressed content boosts CPU demands on your server, which can cause problems if it is overloaded. In supporting gzip and deflate, the robot adheres to the rfc2616 standard, section 3.5.
The robot takes links from other pages, which means that one of them to your site is broken. You may have changed the structure of your site, leaving the link on other web pages outdated.
When the Yandex robot receives an answer with information in the heading that a given URL is a redirect (3xx codes), it adds the redirect's target URL to its review list. If it is a constant redirect (301 code or the page contains a refresh directive), then the old URL is excluded from the review list.
If the robot sometimes gets an error (for example, due to unstable hosting) when contacting a page, it deletes that page from the list until its next successful contact.
No. The Yandex robot will ignore it.
Right now, Yandex supports two protocols: HTTP and HTTPS.
Pages with a “/” at the end of their URLs are different for our robot from those without it. If the pages contain identical content, set up a 301 redirect from one to the other (you can use the htaccess settings for this) or indicate the canonical URL.
It probably found links to them somewhere and tried to index them. Non-existent subdomains and pages should be unavailable or return a 404 error code for the robot to only index the right pages of your site.