We always strive to make web analytics as flexible as possible, which is why Yandex.Metrica allows you to transmit an unlimited amount of data and manage the sampling rate. Now you can also get raw data from Yandex.Metrica for any time period and use it to solve complex analytical tasks, or import it to other analytics systems. Our new program interface, Logs API, makes it possible to download raw data.
How raw data differs from aggregated data
The aggregated data that you see in the Metrica interface or download using the reporting API is calculated for a particular group of sessions. For example, the Time on Site metric might be calculated for all click-throughs from some type of traffic source, all sessions from a certain region, or all sessions from a tablet.
These calculations are all based on raw data: records of individual sessions or pageviews. A table of these records is transmitted using the Logs API. Each record contains useful information from Yandex.Metrica, including detailed info on ad performance in Yandex.Direct and granular ecommerce data, the user's country and city, and various technical information about the session (the browser and mobile phone model used, etc.)
Why you may need raw data
It is, of course, easier to manage aggregated data because you see statistics for various performance indicators and all you have to do is draw conclusions based on them. Raw data, however, is essential to get new statistics that aren't available in the reports.
Here are some examples of how to use non-aggregated data:
Create complex sales funnels
To study the flow leading to a sale in detail, you can track the history of click-throughs to your site for each user separately and highlight patterns that are important for your business. For example, you might want to study the intervals between sessions during which users complete target actions, or what channels usually bring customers in at each stage of the funnel.
Build custom attribution models
There are three standard attribution models in Yandex.Metrica: first, last, or last significant (non-direct) click. By working with raw data, you can create other types of models and analyze in detail how different marketing channels impact the conversion rate. For example, you might track how banner ads affect sales in cases where display ads brought users to your site at least once, but weren't the first or the last traffic source.
Combine data from different sources
Raw Yandex.Metrica data can be added to your data from other systems. For example, you can bring all your ad spending statistics together, or supplement Yandex.Metrica data with data from your CRM.
Monitor discrepancies in statistics
Sometimes figures in other analytics systems don't match Yandex.Metrica's numbers. This is usually due to different calculation methods: by analyzing raw logs, you can figure out how each system processes data and choose the approach that better serves your tasks.
How to work with the Logs API
Raw data is transmitted in the standard tsv format; this type of file can easily be imported into most database management systems. This includes ClickHouse, a free open-source solution that powers Yandex.Metrica itself. ClickHouse can process complex requests in real-time, is easily configured, and doesn't require large computational resources. Furthermore, fresh data can be automatically downloaded to ClickHouse using a script developed by the Yandex.Metrica team.
Logs API documentation, a detailed description of the data scheme, and the script for downloading data to ClickHouse are available on the technologies site.