Using the YAN robot to detect the theme of site content
The YAN robot regularly crawls sites on the Advertising Network and, based on the content of each site, determines its theme for contextual ad display.
When a site is not accessible to the ad robot, contextual ads become less relevant to your site theme, which in turn reduces your revenue.
About the Yandex Advertising Network robot
The name of the Yandex Advertising Network robot is YandexDirect
. In User-Agent format, the robot that indexes pages of participating sites on the Yandex Advertising Network is represented as follows:
Mozilla/5.0 (compatible; YandexDirect/3.0)
User-agent: Yandex
robot in the robots.txt file may result in all Yandex robots being blocked, including the YAN robot.To make sure that the YAN robot crawls your site, the beginning of the robots.txt file in the root directory must have the following entry:
User-Agent: YandexDirect
Disallow:
Check your site's accessibility to the YAN robot
Using the tool to Check a site's accessibility to the YAN robot, YAN partners can check to see if their site pages are accessible for indexing by the YandexDirect
robot. This is checked based on parameters written in the robots.txt file.
This tool lets you find out whether pages of a site were needlessly closed to indexing due to errors in the robots.txt file (for example, if it was necessary to block a site from the search robot and make it accessible only to the ad robot but the rule was incorrectly written).
The tool's operating method is simple. In the interface, you must enter the address or list of addresses to be checked. If it turns out that they were banned from indexing by the ad robot, the system will display the appropriate message, and in some cases will suggest ways to resolve the problem.
Crawling speed of the YAN robot
You can manage the speed at which the YAN robot will crawl your site by using the Crawl-delay
directive in the robots.txt file.
The Crawl-delay
directive determines how long the robot will pause before it loads each successive page of a site. If the robots.txt file or the directive in it is absent, the minimum pause duration is 2 seconds. This pause duration provides optimal indexing speed for most sites without creating excessive loads on their servers or hosting services. For example, it lets the YAN robot fully index a site consisting of several thousand pages within one day.
Crawl-delay
variable to less than two seconds. Setting Crawl-delay
to more than two seconds would be feasible if the YAN robot creates a noticeable load on the site and interferes with its normal operation.Keep in mind that a Сrawl-delay
value that is too high may reduce the quality of ads and, consequently, reduce your site's revenue.
Tragic content
Yandex deems it unethical to display ads on pages with tragic content. We use a special filter to search pages' text for phrases that indicate tragic content, and such pages may be flagged as tragic content. However, for news feeds of “mass media” sites, the tragic content indication may be ignored.