Yandex proudly announces the creation of our new Machine Intelligence and Research (MIR) Division. The MIR division will function as a centralized, cross-functional unit to accelerate innovation and unify our core machine learning technologies. The MIR division will also transfer cutting-edge research from our various research teams into Yandex products and services. Yandex has tapped Misha Bilenko to head the new division, which brings together a mix of teams focusing on AI-centered technologies including:
From speech-to-speech translation to virtual assistants that chat with people and use cameras to see, the MIR division offers amazing opportunities for synthesis and cross-pollination within Yandex’s machine learning, computer vision, speech and translation technologies. By bringing team members from these core technologies together, the MIR division will improve Yandex’s machine and natural processing capabilities, enhancing its products and services and ultimately delivering consumers and businesses a better experience.
Under Misha Bilenko’s guidance, the unified division will be able to integrate its top research findings across all of Yandex products and services. Misha joins Yandex after 10 years of experience working at Microsoft, where he led the Machine Learning Algorithms team in the Cloud and Enterprise division, following a career in the Machine Learning Group for Microsoft Research. Misha brings a unique blend of leadership skills, research expertise and machine learning knowledge to Yandex. His leadership will be instrumental as the MIR division expands Yandex’s research efforts to experiment with new projects and achieve more long-term goals building the next generation of intelligent products and services.
Yandex unveiled a new service for businesses advertising their products on the company’s websites, as well as websites in the Yandex Advertising Network, at its annual e-marketing conference Yac/m. The new service, Yandex.Audience, allows companies use their own customer information to segment audiences for hyper-targeted advertising, as well as target their ads to existing groups of customers to boost upselling campaigns, improve retention, and increase average spend.
After uploading customer information, such as email addresses, telephone numbers or device IDs, to Yandex.Audience, an advertiser receives anonymised IDs identifying their customers among visitors on Yandex’s pages and the YAN websites. These people can now be personally targeted with offers relevant to their previous customer experience – a pair of shoes matching the bag they bought a week ago, or a special loyalty program to recapture lost customers. The same data can be used to identify lookalike audiences – groups of people who exhibit characteristics similar to those of the existing customers and are likely to be interested in the offers that the existing customers were interested in – and target ads to them.
Yandex.Audience is available in English and Russian. To start creating hyper-targeted ad campaigns, an advertiser needs to sign into their account with Yandex’s auction-based service for contextual advertising Yandex.Direct and enter their customer data in the .txt or .csv file. The IDs returned by Yandex.Audience can not identify any individual user, but can be used for delivering personally targeted ads.
Medium- to large-scale businesses such as retailers, banks, car dealers, insurance companies, possessing reasonably large amounts of customer data and striving for customer conversion will appreciate this service most. The amount of data required for hyper-targeted advertising starts at 1,000 records. There is no upper limit for the number of records that can be uploaded to the service, neither is there a limit for the number of types of audiences or hyper-targeted advertising campaigns.
In addition to hyper-targeting opportunities, Yandex.Audience will soon be providing tools for marketing analysis. Yandex's proprietary behaviour analytics technology Crypta, which can identify web users’ interests, age, gender, family status, and even if they have a car or a pet, based on their behaviour online will soon be added to the service. Thanks to this technology, advertisers will be able to use social and demographic statistics of their audiences to plan their marketing strategies.
Yandex builds personalised content recommendation technology Zen into Yandex Browser on all platforms in 24 countries and 15 languages. Based on the latest developments in artificial intelligence research, Zen recommendation technology uses the company’s vast global web index to pick stories, images, videos and other content for each individual user and offer it them right in the new tab of Yandex Browser.
The intelligent content discovery feed in Yandex Browser delivers recommendations based on the user’s location, browsing history, their viewing history and preferences in Zen, among hundreds of other factors. Zen uses natural language processing and computer vision to understand the verbal and visual content on the pages the user has viewed, liked or disliked, to offer them the content they are likely to like. Yandex’s recommendation technology Disco, based on the company’s machine-learning algorithm, MatrixNet, helps Zen choose which suggestions to offer to the user at any given point in time. Targeted to identify the user’s personal long-term interests and cater to them, Zen also delivers content not directly related to their immediate preferences. The more the user interacts with Zen, the better are the chances that they will see serendipitously interesting content.
‘With all the vastness of information available on the internet, something genuinely interesting isn’t easy to come by. Zen helps solving this problem,’ says Victor Lamburt, head of Yandex Zen. ‘It points each user to what’s interesting specifically to them. This is the future for all web browsers: providing personal internet experience and helping people discover something new’.
The infinite personally targeted content feed in Yandex Browser gives web users an opportunity to discover something they appreciate, but wouldn’t have found it otherwise. To start exploring this new internet experience, all one needs to do is download Yandex Browser and give Zen some browsing history to work with. Alternatively, liking or disliking a few websites on Zen’s start up page will help it understand your preferences on the outset. Users can also alter the type or topic of content they are offered later on by choosing to view more of similar content, less of it, or block specific sources altogether.
Zen first appeared as an experimental feature in Yandex’s launcher app for Android in Mexico and Brazil in 2015. The average time the users spent viewing Zen’s recommended content has increased since then from only 5 minutes to 20 minutes in May 2016. Zen is currently available both in Yandex Launcher and Yandex Browser for iPhone, Android mobile devices and Windows PC and laptops.
Yandex’ personal content recommendation technology can also be easily integrated into third-party mobile applications, such as browsers or launcher apps, and offers great monetisation potential for OEMs, app developers, and mobile carriers.
Yandex kits out its browser with built-in domain name system protection technology to safeguard all users of Yandex Browser against DNS spoofing. This is the first time a browser comes with a DNS security technology on board.
Yandex Browser’s built-in active security system Protect provides a comprehensive anti-fraud defense against the majority of currently existing cyber-threats. It automatically checks all downloaded files for viruses, warns users about dangerous websites, and protects their passwords when using public networks.
Yandex Browser’s newly added line of defence, DNSCrypt, is a protocol that authenticates communications between a browser requesting a DNS address of a website and a DNS server offering this address. Provided by renowned DNS security expert, OpenDNS, this protocol will now be doing its job right through the browser, without user's having to purchase, download or activate a separate security product.
DNS spoofing, when your requested website is replaced with a fraudulent website somewhere server-side, or router hijacking, when your router's DNS is changed by malware, according to the industry experts, affect millions of modems and routers worldwide.
Now, instead of going to an unknown DNS resolver, all your requests made through Yandex Browser will go straight to one of 80 secure and fast DNS servers owned by Yandex in multiple locations all over the world. In addition to using a verified DNS resolver, the DNSCrypt protocol encrypts communications between the browser and the server making them impossible to intercept.
Yandex Browser with DNSCrypt is available for Windows and OS X and can be downloaded from here. To start enjoying the browser's DNS protection, turn on the DNSCrypt encryption in settings.
The option to choose a DNS resolver to communicate with your Yandex Browser will become available in the near future.
Machine learning is Yandex's core technology. We’ve long been using it in almost all of our services — to answer users’ search queries, for machine translation, ad targeting, personal recommendations, and plotting routes on maps, among others. Since last year, our MatrixNet machine learning algorithm has been utilised for the optimisation of business processes in real enterprises — weopened Yandex Data Factory for this purpose.
Today we announce yet another application of machine learning in a new field for us — weather forecasting. For this we have developed our own forecasting technology Meteum, which will now be used in the web service and mobile application Yandex.Weather available for iOS and Android.
Basic weather forecasts are traditionally constructed using the Navier-Stokes equations. Models for describing weather are extremely complex, as they depend on a multitude of factors. Programs for their calculation consist of hundreds of thousands of lines of code and run on huge supercomputers. Nonetheless, they still make mistakes, so their forecasts need to be fine-tuned. Besides that, the complexity and resource-intensiveness of traditional calculations results in a situation where forecasts are made for relatively large regions and cities. Constructing a precise forecast for, say, a small village would require taking into account a large number of local factors – such as, solar radiation, phase transitions of water vapour, or thermal radiation from the soil. Performing this task using traditional methods is not much less resource-intensive than for a large city, while the number of people using such a forecast is much lower.
Using machine learning allows collating a large volume of historical data about forecasts and actual weather, identifying causality in forecasting errors and correcting them. This is quicker and easier, as it doesn’t require factoring in laws of nature for each new forecast, but simply corrects traditional mathematical models and localises the forecast down to specific latitude and longitude. That’s exactly what Meteum does.
Our new technology uses traditional meteo models to process the initial data, and works with intermediate results using Yandex’s machine learning technology MatrixNet. To calculate the weather, Meteum constantly compares forecast with actual weather conditions — more than 140,000 times a day. To learn about current weather conditions, we use meteorological station data, as well as weather information from other sources indirectly indicating the situation — about 9 terabytes of data every day. One of the sources is our users, who can let us know about discrepancies between forecasts and real weather conditions via the app. The more data we receive from them, the more precise Meteum’s forecasts will become.
Meteum calculates a new forecast each time a user consults Yandex.Weather on their desktop or mobile device. It locates a person and shows them a fresh forecast for precisely that location. The user can choose another place and time for the forecast to see what the weather will be like around their office in an hour or if it might rain when they go out of town in the evening.
Meteum currently works in 36 regions of Russia, with a possibility to expand to other regions or countries.
Our job is to make life easier for everyone, which sometimes means just showing them the opportunities they have never thought of before. We are always on the lookout for new possibilities for our users on any platforms or devices.
When Telegram instant messaging service launched its open Bot Platform last June, the new possibilities came up in the form of fun and efficient chat bots easily implementable on the service. When chatting to someone online it's much easier to just ask a chat bot a straightforward question and receive an instant answer without leaving the chat, than going over to the browser window and looking for information. Chat bots are good at providing simple, factual information, such as a weather forecast for the coming weekend, current traffic conditions, or definition for a new word.
To help chatters enjoy their chatting even more, some of our developers have made chat bots based on Yandex services using Telegram Bot API. Most of these chat bots provide information relevant for our users in Russia, Ukraine, Belarus, Turkey and Kazakhstan, but the ImageSearch chat bot and the Yandex Translator bot could be of use to anyone outside of our current key markets.
ImageSearch instantly offers the user a graphic image or photograph from the Yandex.Images service in response to a keyword request on Telegram. Just type what you want to see and ImageSearch retrieves a picture, which can be shared with other chatters in a couple of clicks.To see something else in response to the same request, just type “/more”. Add the ImageSearch bot to your chat and communicate with your friends using pictures, gifs or memes mined from the internet by the chat bot.
Yandex Translator helps you translate words or phrases and talk to anyone in any of the languages currently available for translation based on Yandex’s technology. With Yandex Translator each member of an international chat can type in their own language and the chat bot will automatically translate their words into the language of their interlocutor.
Telegram's stores currently offers over thousands of chat bot programs for every taste or any problem. To help other chat bot developers, whose goal to make life easier and more fun for everyone we share, we have created a free analytics tool, Botan. Based on Yandex’s free app tracking and analytics tool, AppMetrica, Botan allows chat bot developers know their audience better, including gathering information about specific audience segments or learning about which bot commands are the most popular with certain groups of people.
Yandex is rolling out a revamped version of its mobile app analytics platform – now under the name “AppMetrica”. The new platform features a powerful mobile ad tracking solution in addition to the pre-existing features – user analytics and crash reports. Now AppMetrica covers all key domains for marketers, publishers and developers – and they can access it completely for free and without any limits, in real-time mode and with a single SDK.
We released our analytics tool for the first time almost two years ago as Yandex.Metrica for Apps. It was our response to the lack of good user analytics solutions on market – we had to create our own to learn how mobile apps published by Yandex were performing. Then we thought it may be of interest to other people around the world, and opened it up for everyone – for free.
Since 2013 we’ve been getting requests from marketers and developers who love the way we do it – currently AppMetrica processes nearly one billion in-app events every day for apps connected to the service. However, users need more, and we got clear signals from our in-house mobile marketers who track Yandex’s app user acquisitions. The main problem is that they had to spend up to 15% of their budgets on mobile ad measurement tools alone, which is quite a lot even for us ☺. Another issue is that product analysts and managers couldn’t easily use detailed traffic source segmentation in analytics tools as the two are separated and usually developed by different providers. In the end, different tools require the integration of multiple SDKs, so project teams need to spend more time on development and testing. We spent almost two years solving these issues to turn AppMetrica into a fully-fledged, integrated, professional mobile analytics and tracking platform.
The new AppMetrica provides detailed ad campaign reporting. Users can drill down to analyse how well different creatives and ad placements are performing, see tracking link parameters breakdowns, and get user engagement reports by applying cohort analysis with retention and event conversion rates which gives a really insightful analysis of traffic quality. AppMetrica is integrated with the most popular mobile ad networks out of the box, including AdColony, InMobi, Millennial Media, Vungle, and many others. We keep expanding the list of ad networks, and users can also manually integrate traffic sources they need and set up postbacks in a few easy steps.
The new platform aids re-engagement improvements using state-of-the-art deep linking technology. Hardcore marketers have the opportunity to pull raw data from AppMetrica via its API so they can create in-house custom reports or use it in their proprietary software. They soon will get even more options to improve conversion: we are now working on integration with data export from AppMetrica to popular re-targeting and look-alike platforms.
AppMetrica works with Android, iOS and Windows Phone apps. Game developers will also enjoy our Unity plugin. AppMetrica is available for free and starts to provide reports in just a few minutes after rolling up an app with the integrated SDK.
With more than 85% of the smartphone market in Russia, Android continues to be the country’s favourite operating system. The great variety of devices within a $200 price range supporting this platform, among other things, has contributed to Android’s popularity with Russians who are happy to compromise some of their smartphone’s technical quality, such as memory space, for its affordability. ‘There’s an app for that’ doesn’t really do it for a lot of budget smartphone users.
To make life easier for the owners of budget Android-based devices we have offered them an all-in-one application for their everyday needs – from current weather, currency exchange rates or traffic conditions to what’s on in the cinema around the corner or the shortest way to the nearest bank or restaurant. We have reissued our search app for Android to provide our users with a one-tap access to our key services on their mobile devices.
The new Yandex app has expanded its functions beyond search to include quick access to email, news, maps, city navigation, taxi booking, or any other service available in Yandex’s product range. To use any of these services one doesn’t even need to have a corresponding app on their phone – the refurbished search app will take them to the mobile version of the service at Yandex.ru. The new Yandex app allows the owners of low-budget Android-based smartphones to enjoy the full mobile experience without having to compromise anything.
First announced in 2011, the Yandex search app now has a weekly user audience of over three million. The new app can be downloaded from Google Play. The users of Yandex.Search in Russia, Ukraine, Belarus and Kazakhstan will be upgraded to the new Yandex app when they update their current version.
Wouldn’t we all like to think that the world that we’re living in is more or less stable? Isn’t there a certain pleasure to be sure that our feet will be pulled to the ground as firmly tomorrow as they are today? Isn’t it reassuring to know that the cup of tea we’ve just put on our desk won’t disappear instantly and reappear on the bottom of the sea on the other side of the planet having traveled its diameter on a straight line? In classical physics, Newton’s laws give us this reassurance. These laws bestow predictability on objects or events as they exist or happen in our reality - on a macroscopic level. On a microscopic level - in particle physics - Fermi’s interaction theory, for instance, postulates that the laws of physics remain the same even after a particle undergoes substantial transformation.
In 1964, however, it became apparent that this isn’t always the case. James Cronin and Val Fitch showed, by examining the decay of subatomic particles called kaons, that a reaction run in reverse does not necessarily retrace the path of the original reaction. This discovery opened a pathway to the theory of electroweak interaction, which in turn gave rise to the theory we all now know as the Standard Model of particle physics.
Although the Standard Model is currently the most convenient paradigm to live with, it doesn’t explain a number of problems, including gravity or dark matter. Other theories compete very actively for the leading role in describing the laws of nature in the most accurate and comprehensive way. To succeed, they have to provide evidence of something that happens outside the limitations of the Standard Model. A promising area to look for this kind of evidence is the decay of a charged lepton (tau lepton) into three lighter leptons (muons), which happen to have a certain characteristic - flavour - that is different from the same characteristic of their ‘mother’ particle. According to the Standard Model, the probability of this decay is vanishingly low, but it can be much higher in other theories.
One experiment at CERN, LHCb, aims at finding this τ → 3μ decay. How are they going to find it? By searching for statistically significant anomalies in an unthinkably large amount of data. How can they find statistically significant anomalies in an unthinkably large amount of data? By using algorithms. These can be trained to separate signal (lepton decays) from background (anything else, really) better than humans. The problem here, however, is not only to find these lepton decays, but also find them in statistically significant numbers. If the Standard Model is correct, the τ → 3μ decays are so rare that their observations are below experimental sensitivity.
To come up with a more sensitive and scale-appropriate solution that would help physicists find evidence of the tau lepton decay into three muons at a statistically significant level, Yandex and CERN’s LHCb experiment have launched a contest for a perfect algorithm. The contest, called ‘Flavours of Physics’, starts on July 20th with the deadline for code submissions on October 12th. It is co-organised with an associated member of the LHCb collaboration, the Yandex School of Data Analysis, and Yandex Data Factory - a big data analytics division of Yandex - and is hosted on a website for predictive modeling and analytics competitions, Kaggle. The winning team or participant will claim a cash prize of $7,000, with $5,000 and $3,000 awarded to the first and the second runners-up. An additional prize in the form of an opportunity to participate in an LHCb workshop at the University of Zurich and $2,000 provided by Intel will be given to the creator of an algorithm that will prove to be the most useful to the LHCb experiment. The data used in this contest will consist both of simulated and real data, acquired in 2011 and 2012, that was used for the τ → 3μ decay analysis in the LHCb experiment.
Contest participants can build on the algorithm provided by the Yandex School of Data Analysis and Yandex Data Factory to make an algorithm of their own.
The metric for evaluation of the algorithms submitted for this contest is very similar to the one used by physicists to evaluate significance of their results, but is much more simple and robust thanks to the collective effort of the Yandex School of Data Analysis and LHCb specialists who have adapted procedures routinely used in the LHCb experiment specifically for this contest. Our expectation is that this metric will help scientists choose the algorithms that they could use on data that will be collected in the LHCb experiment in 2015, and in a wide range of other experiments.
Finding the tau lepton decay might take us out of the comfort zone of the Standard Model, but it just as well may open the door to extra dimensions, shed light on dark matter, and finally explain how gravity works on a quantum level.
Collisions as seen within the LHCb experiment's detector (Image: LHCb/CERN)
The State Duma Committee on Information Policy and Communications has discussed a bill that requires search engine operators to delete hyperlinks to illegal or unreliable information, or even reliable information that refers to events that happened three years ago or more, from their search results on requests from individuals and without a court order.
Internet search is our core business. In more than 15 years in this market, we have put colossal human and financial investments into our search engine, first and foremost, to offer our users search results that are complete, unbiased and useful. If this bill is passed in its current form, a search engine based on these principles will be difficult or even impossible to develop. That is why we feel it is important for us to offer commentary on this bill.
According to its authors, this bill enables any individual to control distribution of unreliable or outdated personal information on the internet. In principle, this gives people a right, which is based on one of the most basic human rights – the right to privacy, including the right to control access to information about oneself. Unfortunately, the procedure offered in this bill does not stop information from being distributed online, but contradicts the basic principles of law and current legislation.
The current law does not permit limiting a person's right to access reliable information. The Constitution of the Russian Federation guarantees everyone the right to freely seek, obtain, transfer, produce and disseminate information by any lawful means (Article 29). The Federal Act ‘On Information, Information Technologies, and Information Protection’ also stipulates an individual’s or organization’s right to search and obtain any information in any form from any source (Article 8). This is exactly what a search engine does – searches for information available through any public source. This bill ignores the right to search for information.
The limitations introduced by this bill reflect imbalance between private and public interests. The need to seek and obtain information often falls within public interest and concerns public figures, whose actions can have an impact on the general public or private lives. This bill impedes people's access to important and reliable information, or makes it impossible to obtain such information. If this bill is passed, the information about a clinic or a doctor, a school or a teacher one is considering to choose, may be impossible to find.
In addition, the procedure for requesting a search engine to remove hyperlinks introduced in this bill opens the door to numerous opportunities for misuse, as it doesn't require any evidence or justification. A search engine, on the other hand, is required to delete an undefined number or hyperlinks to indeterminate web pages. This loophole can very conveniently be used by unscrupulous businesses to undermine their rivals, or by criminals to facilitate fraud.
But even if we assume that it is possible to equal adequate information with inadequate or illegal information in the right to be searched for, one question remains: who will study the information which is searched for, and decide whether it is legal, adequate, relevant or reliable? The bill assigns this role to search engines, while the functions of the court or law enforcement agencies are given to individual commercial organizations. Failure to comply with this role is punished with penalties and litigations.
This bill also ignores the basic principles of information technology and information search. It gives any person the right to request a search engine operator to stop providing hyperlinks to web pages that contain specific information, but it does not require this person to say which hyperlinks should be removed. All they have to do is provide the information, hyperlinks to which they want to be removed. Instead of deleting hyperlinks to specific web pages from search results, a search engine is expected to stop retrieving a piece of information on any search terms and regardless of its location on the internet. For this to become plausible, a search engine operator would have to find all pages containing this information that might appear in any place in search results triggered by any search term that a human mind can come up with. This step alone would take eternity. The next steps would require a search engine operator to make sure that these pages do contain the information hyperlinks to which were requested to be removed, and then confirm that this information is indeed inadequate or older than three years old. It is obvious that this is an impossible task.
Even though the list of flaws of this bill can go further, it doesn't make sense discussing them all at a point when the stipulated procedure itself contradicts the law and is technically impossible.
The current bill is much less well thought through than the Google Spain v ARPD, González (C-131/12) decision by the Court of Justice of the European Union, which has been widely criticized, and which the Russian bill has often been likened to.
The links to be removed from search results mandated by the ruling of the Court of Justice of the European Union are specific, lead to specific information and appear on a narrow class of search terms. Hyperlink erasure is also considered on a case-by-case basis to make sure it does not limit access to important information or alter the balance between private and public interests.