About 20% user queries on Yandex are ambiguous. A query like [apple] might mean either a fruit or a consumer electronics company. Likewise, someone searching for [pizza] might want either a restaurant offering delivery service or a recipe. The spectrum of potential user intent is large, just as the spectrum of potential search engine response. And the task may prove to be almost impossible if the user does not specify what it is they are looking for.
The new search technology, appropriately named Spectrum, developed and implemented by Yandex allows to return a whole spectrum of search results matching different user intents based on the frequency of user searches.
Spectrum is based on query statistics. The system analyses users’ searches and identifies objects like personal names, films, books or cars. Each object is then classified into one or more categories. So, in the search query [panadol dosage] the medicine’s brand name ‘Panadol’ will be categorized as ‘medicine’, while the search term [casablanca] will be classified both into the ‘city’ and the ‘film’ categories. Currently, Spectrum uses about 60 (and counting) pre-defined categories.
Categorization helps the system discriminate between meanings when processing search queries and return results based on user intent.
For each category there is a range of search intents, the intentions with which the users look for something. So, the ‘product’ category will have search intents such as buy something or read customer reviews. The search intents for this category, consequently, will include ‘buy’, ‘reviews’ and ‘feedback’. A category may have from two or three search intents to dozens of them.
Based on the search term’s category, what the users normally want to know about this object, what information about it is available online etc., Spectrum determines percentage of users looking for this object with each of the potential intents. Using this information, the search engine ranks its results for ambiguous queries. Spectrum calculates proportions for search results responding to specific search intent of the same query. The results are ranked so their spectrum is aligned with the spectrum of intents. This allows Yandex maximize the chance for the user to find exactly what they were looking for, even if they didn’t write it in the search box, but only thought of it.
Search query analysis performed by Spectrum is fully automated. Using the power of over two thousand processor cores, each time Spectrum analyses about ten billion search queries. To keep its database up-to-date, the system performs query analysis several times a week.
In addition to search log statistics, Spectrum also uses information from reference sources and encyclopedias, such as Wikipedia. This helps the search engine to recognize new objects, learn about new meanings that do not fit any of the existing categories and add new categories.