Yandex company blog

Yandex Data Factory Opens for Business

9 December 2014, 09:06

As far as the laws of mathematics refer to reality, they are not certain,

and as far as they are certain, they do not refer to reality.

Albert Einstein

A search engine is all about very big data and very advanced mathematics. What we have been doing here at Yandex for more than 17 years already, is develop and implement technologies and algorithms which from a billion of pages on the internet would pick the one that would offer an answer to a web user’s question or solve their problem.

The technologies that power our search are based on machine learning – an approach that allows automating the process of making a decision. Our core machine learning technology, MatrixNet, not only makes its own decisions about whether a certain piece of information is a good answer to a user’s question or not, based on previous experience, but it does so based on a relatively limited experience.

At this point in time, when we can feel that our technologies can be put to use in spheres other than internet search, we are prepared to offer what we’ve got for a larger range of applications.

Today, at the LeWeb innovation conference in Paris, we’re cutting the red ribbon for Yandex Data Factory, our new B2B-service for corporate and enterprise clients, who would like, using our machine-learning technologies, to turn large volumes of data they posses into hands-on business tools, and, by doing so, increase sales, cut costs, optimise processes, prevent losses, forecast demand, develop new or improve existing methods of audience targeting.

We first branched out of our natural realm with our collaboration with CERN on their Large Hadron Collider beauty (LHCb) experiment. For this project we trained our MatrixNet to search for specific types of particle collisions, or events, among thousands of terabytes of information about these events registered by the detector in the LHCb. Yandex provided the LHCb researchers with an instant access to the details of any specific event.

The success of this project gave us reasons to believe it can be repeated in other areas of application. Any industry producing large amounts of data and focused on business goals could benefit from our expertise and our MatrixNet-based technologies: personalisation of search suggestions, recommendations or search resultsimage or speech recognition, road traffic monitoring and prediction, word form prediction and ranking for machine translation, demographic profiling for audience targeting.

Prior to today's announcement we have run pilot projects for about a year designing experimental custom-made solutions for clients all over the world. Most of these projects involved using the data that already exist, which we used for training a MatrixNet-based model, which then was applied to new data – depending on the goal of a client, to generate suggestions for buying a specific product, or predict, with a high degree of accuracy, based on behaviour of thousands or millions of shoppers with similar behaviour patterns, which product exactly will be bought.

Using this machine-learning technique, we helped one of the leading European banks increase their sales by matching each of their products that needed upselling with the best communication channel for each customer. By applying MatrixNet to behavior data on a few million of the bank’s clients, we created a model that could predict net present value of communication of a product to a specific client via a specific channel. This model was then applied to the bank’s new data to generate personalised product recommendations for each client paired with communication channel and ranked by potential net profit value. Preliminary results of the first wave of the bank’s marketing campaign, which was run on three million of clients, were used to fine-tune the original model, which, in its turn, was used in the second wave on a much larger number of the bank’s customers. The resulting sales increase beat the increase forecasted by the bank’s own analysts by 13%.

The same machine-learning approach, together with our own data and expertise in geolocation, helped a road and traffic management agency boost their accident prediction accuracy making it 30 times more accurate. To enable the agency take measures to prevent road traffic accidents, we provided them with one-hour forecasts for traffic jams, as well as alerts for high-risk traffic conditions, in real time, and visualized potential congestion on interactive maps. Using MatrixNet, we first trained predictive formulas on our own UGC information about almost 40,000 road accidents and 5bn speed tracks minded over 2.5 years, complemented by the information provided by the agency: traffic information (i.e., number of cars passing through a given segment of the road in any given time), information about road conditions (type of surface, number of lanes, gradient etc.), weather information. These formulas were then applied to larger data sets and a predictive system for road traffic accidents was developed and deployed in the agency’s situation rooms.

Currently, we’re continuing to work on about 20 projects in various stages of completion across the globe. In essence, we're continuing to experiment, but this time, we know in which direction, or rather – in which directions – we are to move. While the majority of our potential partners, as well as data, come from finance, telecommunications, retail, logistics, utilities, and even the new-fangled 'smart cities', anyone who has data and a business goal can discover new opportunities brought about by mathematics. No matter what industry your business is in, mathematics will work for you. Despite what Einstein said.

Screen Shot 2014-12-08 at 19.01.53.png