Statistical accuracy of data

Metrica provides many numbers on all the possible aspects of how your site works, and all these numbers are accurate in terms of how they are calculated. But from the point of view of analyzing site performance, this is not always the case: for example, if one visitor came to the site and looked at pages for 20 minutes, it is technically true that the average duration of a session on the site is 20 minutes. However, common sense tells us otherwise: we can't make conclusions about a site based on a single session.

Another example: You need to determine which pages of the site are most often entry pages for sessions that result in conversions. This seems easy. You open the Entry pages report, select the desired goal, and sort the report by conversions.

The report contains many pages that only had one user, but the goal was still achieved. The conversion rate for sessions that started from these pages is from 58%. Yet it is obvious that these pages don't have any value for analysis. You can try setting a minimum for the page depth, such as at least 100 page views.

But you can see that 100 isn't enough — there are pages that were viewed over 100 times, but all within a single session (and during a session when the goal was completed). This could be pages autorefreshing, or web crawlers. In any case, these rows in the report prevent you from seeing interesting data, so it would be better to get rid of them. You could set 1,000 page views.

This would remove extraneous information from the report. However, the limit on the number of sessions depends on the selected report period, and you will have to reset the restriction over again for each period.

This example shows that what is important for website analysis is not the calculated conversion rate, but the actual one — the one you would see if there were very many sessions. For 1,000 sessions, the difference between the true conversion rate and the calculated one will be small. But for a single session, the actual conversion rate might be anywhere from a very small number to 100%.

Mathematical statistics methods allow us to calculate how many sessions are needed in order to say with confidence (for example, with 95% probability) that the calculated conversion value does not significantly differ from the true value (for example, it deviates by less than 5%).

Yandex Metrica can calculate this automatically and hide report rows that we can't safely assume are showing a value that deviates from the theoretical true value only insignificantly. To do this, use the Hide statistically insignificant data option.

Filtration is applied to the value of the column that data is currently sorted by. It is also possible to change the filtering threshold — the 95% probability and 5% deviation we mentioned: