Yandex Announces CatBoost, a New Open-Source Machine Learning Library

Yandex, a technology company that builds intelligent products and services powered by machine learning, announced today that it is open-sourcing CatBoost, a new machine learning library based on gradient boosting.

Gradient boosting is a form of machine learning that analyzes a wide range of data inputs. It works by progressively training more complex models to maximize the accuracy of predictions. CatBoost was developed to support a wide variety of data formats out-of-the-box. It is particularly powerful for data sets that contain categorical attributes like user IDs or variables that have a defined set of possible values, yielding accuracy unmatched by other machine learning algorithms. It is well-equipped to handle the complexity that accompanies a wide variety of business problems like detecting fraud, predicting customer engagement and ranking recommended items. CatBoost can be applied across a range of industries to solve problems like improving weather forecasting, fraud detection, industrial process optimization, and even improving the efficiency of particle physics research.

CatBoost delivers highly accurate results even in situations where there is relatively little data.
While deep learning frameworks typically require training on a massive amount of data, and work best with sensory data like images, audio or text, CatBoost works well with relatively small data sets in a variety of domains such as sensory, transactional or historical data, while supporting a wide range of data formats, including inputs provided by deep learning models.

CatBoost is the successor to MatrixNet, the machine learning algorithm that is widely used within Yandex for numerous ranking tasks, weather forecasting and making recommendations. Over the coming months, CatBoost will be rolled out across many of Yandex products and services. Users of our Yandex.Weather service, for example, will soon see even more precise minute-to-minute hyperlocal forecasting to help them better plan for quick weather changes.

In addition to its future application in Yandex products and services, Catboost is also used in the LHCb experiment at CERN, the European Organisation for Nuclear Research. “The state-of-the-art algorithm developed using Yandex's CatBoost has been deployed in LHCb to improve the performance of our particle identification subsystems,” said Marianna Fontana and Donal Hill, coordinators of the particle identification project in LHCb. “Catboost will improve how efficiently we can identify charged particles, providing greater accuracy in the selection of our data”.

For 20 years, Yandex has been pioneering innovation in machine learning and artificial intelligence to help consumers and businesses better navigate the online and offline world. We are thrilled to now share this machine learning expertise with the worldwide community of data scientists and engineers. “Today marks a significant milestone for Yandex in our contributions to the open source community, but it is far from the end of our journey,” said Misha Bilenko, Head of Machine Intelligence and Research at Yandex. “By making CatBoost available as an open-source library, we hope to enable data scientists to achieve top results with least effort, catalyze future innovation, and ultimately define a new standard of excellence in machine learning.”

Contacts:
Media Relations
Melissa McDonald, Matvey Kireev
Phone: +7 495 739-70-00
E-mail: pr@yandex-team.ru

Logo
/Download (PDF, 324,8 КБ)
/Download (PDF, 324,7 КБ)
Please follow these rules