Yandex Search is dedicated to providing users with search results in the simplest and most understandable structured form. User should be able to get what they need with as little effort as possible (even if they don't know exactly what that is yet). Search queries are often related to real-world objects: people, cities, animals, movies, books, events, and so on. Results for them can be object cards with structured information.
Our team writes algorithms for compiling a database of objects, building links between them, and preparing this data for search results.
Examples of tasks:
- Extract data about objects from unstructured text or websites
- Merge identical objects from different sources into a single card
- Determine the type and main characteristics of objects
- Select suitable images and text descriptions for objects
- Find the main user intents associated with objects
- Determine the essential components of each object
- Deliver important edits to objects before searching
We use Python for offline data processing and C++ in the query response loop. We rely heavily on internal tools for big data processing: YT and YQL. We use the BERT and YALM models prepared inside Yandex. For example, one of the first uses of the YALM model in Yandex Search was to automatically generate the subheadings of object cards.