Sampling

Yandex Metrica generates reports using detailed session and user data. Calculating indicators for large data volumes can require a considerable amount of time and resources. For this reason, Yandex Metrica may only use a portion of the available data for its reports, which is known as sampling. This way, Yandex Metrica can maintain a high report generation speed.

What is sampling?

Sampling is a statistical method used in data processing where general observations about all the data are drawn from its subset, known as a sample.

Suppose we are talking about direct hits to the site. We can count how many of them were in 1/10 of all sessions, then multiply the result by 10 and get the approximate number of direct hits. As a result, you will get the response 10 times faster, but the response will be approximated.

Learn more about sampling

Yandex Metrica sampling mechanism

The sampling algorithm evenly selects data across the website’s audience, ensuring that the sampled report maintains same correlation and attribute distributions as the full report.

Note

  • Sampling is used only when creating analytical reports in Yandex Metrica. The original data is not deleted or altered.
  • Sampling is not applied in the “Yandex Direct” group reports in Yandex Metrica.
  • Audience segments are created and saved using 100% of the data in the report.
  • Full data is displayed in Yandex Direct reports.
  • Sampling does not affect ad performance.

When can sampling be applied in Yandex Metrica?

Sampling can be applied when generating reports both in the web interface and in the API.

Yandex Metrica can apply sampling when the original request sample exceeds 500,000 sessions (or 2 million pageviews in corresponding reports). The sampling ratio is determined dynamically to ensure that the report includes the maximum amount of data, which depends on the required computational resources.

How to control sampling

You can adjust the sampling level of your reports using Sample.

If you increase the sample size, the report may take longer to generate or may not generate at all. To ensure your report can be loaded, Yandex Metrica may restrict manual sample size increases if the source data contains more than 500,000 sessions.

Note

This restriction comes into effect starting September 2023.

How to get reports for 100% of your data

Reduce the report period

Sampling is automatically applied when you exceed the limit on the amount of raw data in the request. You can adjust the request to include no more than 500,000 sessions by reducing the report period.

For example, if your website has a monthly traffic of about 100,000 sessions, the sampling ratios for different report periods will be as follows:

Report period

Total sessions in the original sample

Percentage of data used to generate the report

One month

100,000

100%

Five months

500,000

100%

Six months

600,000

83,3%

Twelve months

1,200,000

41,7%

Become a Yandex Advertising Network partner

The ability to generate any report based on 100% data is available to YAN partners if the volume of visible impressions consistently amounts to at least 10 million per month.

Chat with us

Write an email

Please note: Our support team will never initiate a call to you. Do not follow any instructions of people who call you and introduce themselves as the Yandex Metrica support team.