How to measure and improve Search quality
Yandex makes targeted efforts to improve Search. Every change in the algorithm that generates search results must satisfy the Search development principles.
Search development principles
The objective of Search is to provide the user with full, useful, and relevant information in such a way so you can perform tasks quickly and easily.
Search results are fully generated by machine-learning algorithms to ensure unbiased ranking and data presentation. Search results cannot be reordered manually.
To find the most relevant pages, Search performs automatic analysis of the query, page content, and the history of user interaction with these pages, as well as the language, location, relationships between different pages, and many other factors. The quality of machine-learning ranking algorithms is monitored based on metrics that are calculated automatically.
Automatically collected data on user interactions with search results and evaluation by assessors who manually annotate different SERPs serve as input for the machine learning of ranking algorithms. Assessor evaluation is also used to check the quality of search results generated by the algorithm in terms of quality and relevance to the queries. Assessors’ impartiality is ensured by a system of controls that extend to recruitment, training, instructions, and tools. The instructions for assessors explain how to evaluate document quality and relevance to the query. They are designed in a way that ensures objective evaluations. To compensate for the effect of individual biased evaluations, they always have some overlap. Moreover, evaluations are never used in ranking directly, as they only serve for machine learning of ranking algorithms.
All ranking system changes take place through the implementation of ranking algorithms to eliminate manual intervention. All changes are observable, with personal responsibility assigned to each of them, and they are automatically checked by metrics built on assessor evaluations (Proxima) and user interaction with search results elements (Proficit).
Yandex makes it possible to find information gathered from lots of sources available for indexing. The completeness of information indexing is a top priority. Data indexing is performed uniformly for each type of source. Indexed content can only be removed from search results if pages are categorized as spamdexing, can be harmful to the user, or violate applicable laws.
The objective of Search is to save users’ time by providing information in a form that is most convenient for use. The usefulness of individual results and overall search results is examined through analysis of user interactions with Search results. The format and completeness of data presentation are determined by the likelihood of meeting the user’s objective and the type of presented information and do not depend on the specific data source.
How changes in Search algorithms are evaluated
Potential changes in Search algorithms are evaluated based on two metrics:
- Proxima — a page quality metric that is calculated based on the evaluation of pages from the database (index) assembled by Yandex and other data quality signals.
- Proficit — a metric of search results’ usability that is calculated based on user interactions with Search.
Yandex widely uses the services of professionals (assessors) to evaluate search quality results. They evaluate individual sites and other elements of search results in terms of quality and relevance. Assessors’ evaluations do not affect search results directly. Rather, they help to determine whether a particular change in the ranking algorithm is appropriate.
Changes that were deemed appropriate are checked using an online experiment. In such an experiment, users are randomly divided into two groups: one group gets the new functionality, while the other gets the current search option. After enough data is gathered during the experiment, the Yandex team decides whether users like the proposed change or not. This conclusion is based on the acceptance of search quality metrics, with Proficit as a major one.
Proxima
Proxima is a page quality metric built upon assessors’ evaluations and additional signals. Proxima considers many aspects in order to make the algorithm more acutely distinguish page quality, including:
- page relevance to the query (including expert evaluations from various subject-matter specialists);
- the likelihood of meeting the user’s objective on the particular page and site;
- quality, usefulness, and uniqueness of the content;
- balance between useful and intrusive content;
- additional signals on the quality of content and the credibility of the author on complex topics (like health care, legal and financial services, etc.);
- convenience of user’s consumption of the content.
More details and examples are available in our stories about Proxima from the 8th, 9th, and 10th Webmaster workshops. Our team continues to refine the metric by adding new signals to measure the quality of pages and the resolution of user objectives.
Proficit
The objective of Proficit is to measure the search results’ usefulness (how fast the user’s objective can be met).
Proficit takes into account the quality and quantity of user interactions with all individual search results and with other search results elements. Data presentation formats with higher predicted Proficit values are selected to generate search results.
The mutual arrangement and set of elements on the search results page are structured so as to achieve the maximum predicted values of Proficit and Proxima for the page as a whole. The predicted value is measured for various combinations of elements on the search results page. These elements are arranged so as to achieve the maximum predicted Proficit value for the particular arrangement. If the elements are arranged differently, it is highly probable that the final calculated Proficit of the search results as a whole will be lower. If the actual Proficit of particular search results differs from the predicted value, the algorithm takes this into account during retraining.
Rules for measuring and major factors affecting Proficit
Proficit (of search results as a whole and individual elements) is calculated according to the following rules:
- Each user interaction with an element of search results is deemed successful or unsuccessful.
- If the interaction is deemed successful (for example, if the user made a query, navigated to an external site, and did not return to Search for a long time), the metric is increased positively.
- In contrast, if the interaction was deemed unsuccessful (for example, if the user returned to the search results page after a short time or had to redo the query to find the necessary information, or if the user’s objective was met using another element), the metric is decreased. The greatest penalty is charged for the most visible elements that occupy a large area in search results and such places that are attractive to the user.
- If search results in their entirety fail to meet the user’s objective and the query had to be redone, the metric is decreased as well.
The metric considers both clicks on the search results (for example, links, business profiles, phone numbers) and click-free interactions (where the user has found the necessary information in the actual response, and others). Both can be part of a successful interaction that increases the value of the metric.
Page updated: March 22, 2024.