Crowdsourcing breaks large processes down into bite-sized chunks that can then be delegated. One standard example is labeling images gathered to train an ML model. When it comes to any process, there are always two sides: the client, who wants to label gigabytes worth of images, and the people willing to do simple work for relatively small amounts of money. Crowd platforms are the marketplaces that bring the two together.
Our internal crowdsourcing department helps Yandex services get their work done. We label data, handle testing, provide sales and user support, moderate content, and prepare texts, graphics, and designs. But if that work is going to get done well, we need the additional infrastructure our team is responsible for.
For example, when external users contact the support service, there’s a set deadline their requests need to be handled by. That entails a guarantee that we’ll have operators available at any given time. One of our services, Crew, is a WFM system for forecasting loads and automatically generating shift schedules while also managing groups of people: user profiles, role structures, skills, and so on.
The support service needs a system for storing and processing user queries in addition to the operators we have on staff. That’s handled by Sansara, an omnichannel communication platform for customer services, sales, and more. With a single interface and data model, it lets us work with users across all popular communication channels (phone, chat, email, and social media), store user profiles (information about how they don’t like to be called, for instance), and much more. And given how many questions the support service gets asked about standard situations, we’re building on Sansara to develop a knowledge base for the support service’s operators and bots, as well as infrastructure for automating query processing. That will help our chat bots.
Finally, whether it’s customer support, sales, testing, or labeling, the crowd solutions we offer leverage a number of internal services, and each of them stores a set of unique information. Crowd DWH was our solution to the problem of analyzing all that data. It is infrastructure that quickly creates a data lake for any crowd service, and we plan to leverage it for aggregations, alerts, and so on.
Our stack: