Content moderation

  1. Before you start
  2. Creating a project
  3. Adding a task pool
  4. Uploading tasks
  5. Adding a training
  6. Starting a pool and getting results

This project template suits for content moderation, when you need to check the text for compliance with the rules.

This template helps solve such tasks as:

  • Moderation of comments and nicknames on the forum.
  • Checking ads on the site, product reviews in the store, messages in social networks.
  • The presence of a brand or company name.
More
  • News. You have a news site where visitors leave comments on the news. Analyze the comments and decide whether to display them on the site.
  • Social networks. Classify social media posts by several attributes.
  • Text features. Evaluate the emotion and tone of the comment.
  • Toxicity of comments. Evaluate if the comment is toxic.
  • Toxic comments markup. For each comment, select the toxicity level corresponding to its content.
  • Value of the text (if the text is useful). Determine if the message contains spam.
  • Online stores: content moderation. Determine which of the suggested attribute values is the best for a specific product.
  • Moderation of comments complained about. Evaluate the comments that other users of the social service consider unacceptable and decide if they should be banned.
  • Moderation of messages. Check the comments for offense, law violations spam and advertising.
  • Moderation of sport comments. Specify if the comment meets the moderation rules in a particular area or service.
  • Meaningfulness of information. Check out the comments from a variety of sources for meaningfulness and mark them up.
  • Markup of Yandex.Market comments.

For example, if you have a blog and you want to moderate comments on a new post that collected a lot of negative feedback, check the text comments for violation of the rules: insults, violations of the law, spam and advertising.

Tip.

Run the project in the Sandbox first. This helps you avoid making mistakes and spending money on a task that isn't working right.

Example of a prepared task

Before you start

If you have a complex project, register in the sandbox and create a project there. There you can:

  1. Test the project settings as a performer.
  2. Transfer them to the production version.

This helps you avoid making mistakes and spending unnecessary money on a task that doesn't work.

Creating a project

In the project, you define what the task will look like for the performer.

  1. Click + Create a project and choose the Content moderation template.

  2. Enter a clear name and write a short description for the project. Performers will see this in the task list.

  3. Write short and clear guidelines (see the recommendations).
  4. Note. This tutorial shows how to create a task interface in the HTML/JS/CSS editor. You can also try creating a task interface in the Template Builder.
    Define which objects you are going to pass to the performers and which one you want to receive from them in response. To do this, add input and output fields in the Specifications block.
    What are input and output data?

    Input — Types of objects the performer receives for the task completion. In this template, you need text. In other tasks, it can be a picture or geographical coordinates.

    Output data is types of objects that you receive after the task is completed. For this template, it is one of the two response options. If the performer chooses the second response, a list of checkboxes opens — the performer should choose appropriate options from them. In other tasks, the output data can contain entered text or an uploaded file, for example.

    Learn more about input and output data fields.

    In this case they are:

    • Input data: the comment field for the comment to be reviewed.
    • Output data: the quality string to record the response option chosen. If the second option is chosen, it includes also marks about the type of violation.
  5. Create the task interface in the HTML block. It describes how the task elements should be arranged in the task.

    The HTML interface uses standard HTML tags and special components in double (or triple, as for the comment field) curly brackets for input and output data fields.

    
    {{{comment}}}
    
    {{field type="radio" name="quality" value="OK" size="L" label="All is OK" hotkey="1" class="yes"}}
    {{field type="radio" name="quality" value="BAD" size="L" label="There are violations" hotkey="2" class="no"}}
    
    {{field type="checkbox" name="advertising" label="Ads or spam" hotkey="q"}}
    {{field type="checkbox" name="nonsense" label="Nonsense" hotkey="w"}}
    {{field type="checkbox" name="insult" label="Insults" hotkey="e"}}
    {{field type="checkbox" name="law_violation" label="Violation of law" hotkey="r"}}
    {{field type="checkbox" name="profanity" label="Obscenities" hotkey="t"}}
    
    This describes the following task design:
    • The text of the comment to check at the top.
    • Two radio buttons, the chosen value is recorded in the quality field.
    • Five check boxes that appear if you select the second switch. The chosen checkbox options are written to the fields with the corresponding name in the result.
  6. Click Preview to view your task.

    The project preview window shows a single task with standard data. You can define the number of tasks to show on the page later.

  7. Save the project by clicking Finish editing.

Adding a task pool

A pool is a set of paid tasks sent out for completion at the same time.

  1. On your new project page, click Add pool.
  2. Give the pool any convenient name and description. They are available only to you, the performer sees only the project name and description.
  3. Set the price per task page (for instance, $0.02).
    What is a task page?

    A page can contain one or several tasks. If the tasks are simple, you can add 10-20 tasks per page. Don't make pages too long because it slows down loading speed for performers.

    Performers get paid for completing the whole page.

    The number of tasks on the page is set when uploading tasks.

    What is the fair price for a task page?

    The general rule of pricing is the more time the performer spends to complete the task, the higher the price is.

    Register in Toloka as a performer, find out how much other requesters pay, and see examples of cost for different types of tasks.

  4. Add Filters to select performers. If the instructions, the task interface and the comments themselves are in Russian, use the “Russian-speaking performers” set. If you plan to analyze comments in English or another language, add the "language = English" filter to them.
  5. Set up Quality control. Quality control rules allow you to filter out inattentive performers. You can also set up quality control in the project. You won't need to review the assignments.

    Typical settings for the content moderation task:

    Fast responses
    Add a block and specify the following values:

    A performer who completes a task page in less than 5 seconds will be blocked and won't be able to complete your tasks for 10 days.

    Tip. How do I determine the fast response time?

    Complete your task and record the time. If you ban users for one fast response, then set a minimal time. If you do it after several fast responses, increase the time slightly.

    Control tasks
    Specify the following values:

    A performer who gives more than 40% of incorrect responses will be blocked and won't be able to complete tasks in this project for 10 days.

    Additionally, configure:

    Captcha
    Example of the rule configuration.

    A performer who entered a captcha at least 5 times and the percentage of correct answers is less than 60% is banned and can't complete your tasks for 10 days.

    Majority vote
    Examples of the Majority vote rule configuration. Choose appropriate actions and parameters.
  6. Configure overlap or incremental relabeling:

    • Overlap is the number of performers to complete the same task. For content moderation tasks, 3-5 is an appropriate value. In this case, it makes sense to use Aggregation of results to check the reliability of responses.
    • Incremental relabeling. It will help you optimize your budget for getting the most reliable responses. Example of settings.

      For this parameter to work, you need to load tasks using Smart mixing.

  7. In the Speed/quality ratio block, you can leave the settings unchanged.
  8. Set the Time allowed for completing a task page. It should be long enough to read the guidelines and wait for task data to download For example, 150 seconds.
  9. Save the pool.

Uploading tasks

Prepare your own task file. Check out the example in a demo TSV file. You can find it on the pool page. At the top-left of the page, there are links to TSV files with regular, control, and training tasks.
  1. Click Upload. In the window that opens, you can also download a sample TSV file by clicking Sample file for uploading tasks.
    What is TSV?
    A TSV file is a spreadsheet in the form of a text file with columns separated by tab.
    Work with it in a text or spreadsheet editor, save it to the desired format. More about working with a TSV file. There is a CSV format that is similar to TSV, but you should use a TSV file for uploading.
    Note. The file must be saved in UTF-8 encoding.
  2. Add input data in it. The header of the input data column contains the INPUT word. Add the comments you want to check in it. Leave the other columns empty.

    This is the beginning of the file with the tasks for checking comments:

  3. Load the tasks and choose Smart mixing.
    What is smart mixing?
    The task distribution logic that places tasks of different types on the same page. For example, one control task per three main tasks. If you have a lot of comments, set one control task per 9-10 regular comments on the page.
  4. Mark up control tasks.
    • Click Mark up → Create control tasks.
      Note. If you selected something else instead of smart mixing, click Edit. If this button is missing, delete the file and upload it again.
    • Add the correct responses in the control tasks. There should be as many as you set in the settings above.

    • Go back to the pool or project page from the menu bar at the top. The uploaded and marked up tasks will be saved.

Adding a training

A training pool is a set of unpaid training tasks where the performer learns to answer correctly. Training tasks contain the correct answer and a hint shown if the performer gives the wrong answer.

Tip. Write clear instructions. Criteria for good and bad comments on different resources differ, and you need to explain to the performers what and how they should check in the tasks.
  1. Open the project page, go to the Training tab and click Add training.

  2. Give a name to the training pool and set the time for task page completion.
  3. Save the pool by clicking Create training.
  4. Get the Sample upload file or edit the one you used for uploading the main pool tasks.
    Note. TSV files for all project pools have the same structure.
  5. Add comments to include in training in the TSV file.
  6. Upload the file and specify the number of tasks on the page. For example, 2. This number must not exceed the number of tasks per page in the linked pool.
  7. Click Download and enter the number of training tasks on the page.
  8. Click Add.
  9. Click Mark up → Create training tasks. Next, add correct answers and hints for all the uploaded tasks. Why do I need markup?.
  10. After the file is uploaded, open the Preview and check that the tasks are displayed correctly.
  11. Link training.

    • Open the main pool.
    • Click Edit.
    • Choose the name of the training you just created.
  12. Set the Level required to 70. This means that the main pool will be available for users who made no more than 30% of mistakes in the training pool.

  13. Click Save.

Learn more about creating a pool with training.

Starting a pool and getting results

  1. Start the training pool first, then the regular pool. From the pool page, you can do this by clicking . From the project page — click next to the pool name.
  2. Track the completion of tasks in the Pool statistics section. If you created a project in the sandbox, you can test it by yourself.
  3. When the pool is fully completed, start aggregation of results. Next to the Download results button, click  → Dawid-Skene aggregation model.

    In the TSV file with aggregated responses, you can see the response significance in percentage in the CONFIDENCE field. It helps you understand how reliable the comment evaluation is. Learn more about aggregation.

  4. Track the aggregation progress on the Operations page (next to the Download results button: ). When the process is completed, click Download.

Troubleshooting

How do I show two different versions of the text to performers?

If you pass texts to the input data, you can upload 2 different tasks to the pool: pass Text 1 in the INPUT: <input field name> field of Task 1. In Task 2, use this field to pass Text 2.

If the text is in the HTML block of the task template, then clone the project. To limit a performer to doing only one task in your project, use the Submitted responses rule. You can assign a skill or ban the performer after they submit one response.

I can't upload files from Yandex.Disk

If images, audio or video from Yandex.Disk don't appear in the instructions or on the task page, make sure you connected Yandex.Disk correctly and uploaded the files.

How to create a task where the performer has to view a video from Yandex.Disk

To create such a task, take the video markup template as a basis.

To host your videos on Yandex.Disk, connect Yandex.Disk and set up the project.

Why can't my task for selecting objects in an image display images from Yandex.Disk?
The problem is in your task template. Make sure that:
  • In the project, the input field where you pass the file link has the "string" type.
  • The component in the task template uses the "proxy" expression.
  • The format of relative links in the TSV file with tasks is correct: <unique name>/<file path and name>.
For detailed instructions and videos, see the page Using files from Yandex.Disk.
Frequent mistakes when connecting to Yandex.Disk and uploading files
  • The Input data field in the project settings has the link type. You should choose the string type.
  • The TSV file contains absolute references to the task files. You need to insert a link <unique name>/<path and file name>. For example: yadisk/image1.jpg or yadisk/photos/image1.png.
  • Photos from Yandex.Disk are used in the task instructions in the mobile app. To display the photos in the instructions, use only direct links.
  • Files are deleted or aren't located in the Yandex.Disk folder that the link leads to.
  • The OAuth token isn't active. Update the token on the External Services Integration page.
To display files from Yandex.Disk (images, audio files, videos) to the performer:
  1. Link Yandex.Disk in your profile.
  2. Set the string type for the input data field.
  3. Insert a file link using the proxy component.

Detailed instructions

Files load too slowly from Yandex.Disk. How do I speed up the loading process?

Try the recommendations on this page or contact Yandex.Disk support.