Using the Yandex Advertising Network robot to detect the theme of site content

The Yandex Advertising Network robot regularly crawls sites on the Advertising Network and, based on the content of each site, determines its theme for contextual ad display.

When a site is not accessible to the ad robot, contextual ads become less relevant to your site theme, which in turn reduces your revenue.

About the Yandex Advertising Network robot

The name of the Yandex Advertising Network robot is YandexDirect. In User-Agent format, the robot that indexes pages of participating sites on the Yandex Advertising Network is represented as follows:

Mozilla/5.0 (compatible; YandexDirect/3.0)
Attention. Blocking the User-agent: Yandex robot in the robots.txt file may result in all Yandex robots being blocked, including the Yandex Advertising Network robot.

To make sure that the Yandex Advertising Network robot crawls your site, the beginning of the robots.txt file in the root directory must have the following entry:

User-Agent: YandexDirect

Check your site's accessibility to the Yandex Advertising Network robot

Yandex Advertising Network partners can open Yandex Webmaster and go toTools → Robots.txt analysis to check if their site pages can be indexed by the YandexDirect robot. This is checked based on parameters written in the robots.txt file.

This tool lets you find out whether pages of a site were needlessly closed to indexing due to errors in the robots.txt file (for example, if it was necessary to block a site from the search robot and make it accessible only to the ad robot but the rule was incorrectly written).

The tool's operating method is simple. You must embed the source code of the robots.txt file or choose a website to check. If it turns out that they were banned from being indexed by the ad robot, the system will display the corresponding message, and in some cases will suggest ways to resolve the problem.

Crawling speed of the Yandex Advertising Network robot

You can manage the speed at which the Yandex Advertising Network robot will crawl your site by using the Crawl-delay directive in the robots.txt file.

The Crawl-delay directive determines how long the robot will pause before it loads each successive page of a site. If the robots.txt file or the directive in it is absent, the minimum pause duration is 2 seconds. This pause duration provides optimal indexing speed for most sites without creating excessive loads on their servers or hosting services. For example, it lets the Yandex Advertising Network robot fully index a site consisting of several thousand pages within one day.

Tip. For large sites, we recommend that you set the Crawl-delay variable to less than two seconds. Setting Crawl-delay to more than two seconds makes sense if the Yandex Advertising Network robot creates a noticeable load on the site and interferes with its standard functionality.

Keep in mind that a Сrawl-delay value that is too high may reduce the quality of ads and, consequently, reduce your site's revenue.

Tragic content

Yandex deems it unethical to display ads on pages with tragic content. We use a special filter to search pages' text for phrases that indicate tragic content, and such pages may be flagged as tragic content. However, for news feeds of “mass media” sites, the tragic content indication may be ignored.