Sampling
Yandex Metrica generates reports using detailed session and user data. Calculating indicators for large data volumes can require a considerable amount of time and resources. For this reason, Yandex Metrica may only use a portion of the available data for its reports, which is known as sampling. This way, Yandex Metrica can maintain a high report generation speed.
What is sampling?
Sampling is a statistical method used in data processing where general observations about all the data are drawn from its subset, known as a sample.
Suppose we are talking about direct hits to the site. We can count how many of them were in 1/10 of all sessions, then multiply the result by 10 and get the approximate number of direct hits. As a result, you will get the response 10 times faster, but the response will be approximated.
Learn more about sampling
Yandex Metrica sampling mechanism
The sampling algorithm evenly selects data across the website’s audience, ensuring that the sampled report maintains same correlation and attribute distributions as the full report.
Note
- Sampling is used only when creating analytical reports in Yandex Metrica. The original data is not deleted or altered.
- Sampling is not applied in the “Yandex Direct” group reports in Yandex Metrica.
- Audience segments are created and saved using 100% of the data in the report.
- Full data is displayed in Yandex Direct reports.
- Sampling does not affect ad performance.
When can sampling be applied in Yandex Metrica?
Sampling can be applied when generating reports both in the web interface and in the API.
Yandex Metrica can apply sampling when the original request sample exceeds 500,000 sessions (or 2 million pageviews in corresponding reports). The sampling ratio is determined dynamically to ensure that the report includes the maximum amount of data, which depends on the required computational resources.
How to control sampling
You can adjust the sampling level of your reports using Sample.
If you increase the sample size, the report may take longer to generate or may not generate at all. To ensure your report can be loaded, Yandex Metrica may restrict manual sample size increases if the source data contains more than 500,000 sessions.
Note
This restriction comes into effect starting September 2023.
How to get reports for 100% of your data
Reduce the report period
-
Sampling is automatically applied when you exceed the limit on the amount of raw data in the request. You can adjust the request to include no more than 500,000 sessions by reducing the report period.
For example, if your website has a monthly traffic of about 100,000 sessions, the sampling ratios for different report periods will be as follows:
Report period
Total sessions in the original sample
Percentage of data used to generate the report
One month
100,000
100%
Five months
500,000
100%
Six months
600,000
83,3%
Twelve months
1,200,000
41,7%
Become a Yandex Advertising Network partner
- The ability to generate any report based on 100% data is available to YAN partners if the volume of visible impressions consistently amounts to at least 10 million per month.
Useful links |
Online training |