Sentiment analysis and content moderation

  1. Before you start
  2. Creating a project
  3. Adding a task pool
  4. Uploading tasks
  5. Adding a training
  6. Starting a pool and getting results

This project template works well for content moderation, when you need to check texts for compliance with guidelines.

Use this template when you need to:

  • Moderate comments and nicknames on a forum.
  • Check ads on a site, product reviews in a store, or messages in social networks.
  • Check for the presence of a brand or company name.
More types of content
  • News. You have a news site where visitors leave comments on the news. Analyze the comments and decide whether to display them on the site.
  • Social networks. Classify social media posts by several attributes.
  • Text features. Evaluate the emotion and tone of the comment.
  • Toxicity of comments. Evaluate if the comment is toxic.
  • Toxic level of comments. For each comment, select the toxicity level corresponding to its content.
  • Value of the text (if the text is useful). Determine if the message contains spam.
  • Online stores: content moderation. Determine which of the suggested attribute values is the best for a specific product.
  • Moderation of comments complained about. Evaluate the comments that other users of the social service consider unacceptable and decide if they should be banned.
  • Moderation of messages. Check comments for offensive language, illegal content, spam, and advertising.
  • Moderation of sport comments. Specify if the comment meets the moderation rules in a particular area or service.
  • Meaningfulness of information. Check comments from a variety of sources for meaningfulness and label them.
  • Yandex.Market comments.

For example, if you have a blog and you want to moderate comments on a new post that collected a lot of negative feedback, check the text comments for violation of the rules: insults, violations of the law, spam and advertising.

Tip.

Run the project in the Sandbox first. This helps you avoid making mistakes and spending money on a task that isn't working right.

Example of a prepared task

Before you start

If you have a complex project, register in the sandbox and create a project there. There you can:

  1. Test the project settings as a performer.
  2. Transfer them to the production version.

This helps you avoid making mistakes and spending unnecessary money on a task that doesn't work.

Creating a project

In the project, you define what the task will look like for the performer.

  1. Click + Create a project and choose the Semantic analysis and content moderation template.

  2. Provide general information:

    1. Enter a clear name and a short description for the project. Performers will see this in the task list.

    2. Optionally add a Private comment.

    3. Click Save.

  3. Edit the task interface:

    Note. This tutorial shows how to create a task interface in the HTML/JS/CSS editor. You can also try creating a task interface in Template Builder.
    1. Define which objects you are going to pass to the performers and which one you want to receive from them in response. To do this, add input and output fields in the Specifications block.
      What are input and output data?

      Input — Types of objects the performer receives for the task completion. In this template, you need text. In other tasks, it can be a picture or geographical coordinates.

      Output data is types of objects that you receive after the task is completed. For this template, it is one of the two response options. If the performer chooses the second response, a list of checkboxes opens — the performer should choose appropriate options from them. In other tasks, the output data can contain entered text or an uploaded file, for example.

      Learn more about input and output data fields.

      In this case they are:

      • Input data: comment field, text for checking.
      • Output data: the quality string to record the selected response option from the field “Are there any violations in the text?”. Other fields: types of violations. You can use this list of fields or customize it for your tasks.

      Create the task interface in the HTML block. It describes how the task elements should be arranged in the task.

      The HTML interface uses standard HTML tags and special components in double (or triple, as for the comment field) curly brackets for input and output data fields.

      
      {{{comment}}}
      
      {{field type="radio" name="quality" value="OK" size="L" label="Everything is fine" hotkey="1" class="yes"}}
      {{field type="radio" name="quality" value="BAD" size="L" label="Violations found" hotkey="2" class="no"}}
      
      {{field type="checkbox" name="advertising" label="Ads or spam" hotkey="q"}}
      {{field type="checkbox" name="nonsense" label="Nonsense" hotkey="w"}}
      {{field type="checkbox" name="insult" label="Insults" hotkey="e"}}
      {{field type="checkbox" name="law_violation" label="Illegal content" hotkey="r"}}
      {{field type="checkbox" name="profanity" label="Profanity" hotkey="t"}}
      
      Copied to clipboard

      This notation describes the following task design:

      • The text of the comment to check at the top.
      • Two radio buttons, the chosen value is recorded in the quality field.
      • Five check boxes that appear if you select the second switch. The chosen checkbox options are written to the fields with the corresponding name in the result.
    2. Click to see the performer's view of the task.

      Note. The project preview shows one task with standard data. You can define the number of tasks to show on the page later.
    3. Save the changes.

  4. Write instructions for performers:

    1. Write short and clear guidelines (see the recommendations). Describe what needs to be done and give examples in them.

      You can prepare instructions in HTML format, then copy and paste into the editor. Click <> to switch to HTML mode.

    2. Click Finish.

Adding a task pool

A pool is a set of paid tasks sent out for completion at the same time.

  1. On your new project page, click Add pool.
  2. Give the pool any convenient name and description. They are available only to you, the performer sees only the project name and description.
  3. Set the price per task suite (for instance, $0.02).
    What is a task suite?

    A task suite can contain one or several tasks that are shown on the same page. If the tasks are simple, you can add 10-20 tasks per suite. Don't make pages too long because it slows down loading speed for performers.

    Performers get paid for completing the entire task suite.

    The number of tasks per suite is set when uploading tasks.

    What is a fair price for a task suite?

    The general rule of pricing is the more time the performer spends to complete the task, the higher the price is.

    Register in Toloka as a performer, find out how much other requesters pay, and see examples of cost for different types of tasks.

  4. Add Filters to select performers. If the instructions, the task interface and the comments themselves are in Russian, use the “Russian-speaking performers” set. If you plan to analyze comments in English or another language, add the "language = English" filter to them.
  5. Set up Quality control. Quality control rules allow you to filter out inattentive performers. You can also set up quality control in the project. You won't need to review the assignments.

    Typical settings for the content moderation task:

    Fast responses
    Add a block and specify the following values:

    A performer who completes a task suite in less than 5 seconds will be suspended and won't be able to access your tasks for 10 days.

    Tip. How do I determine the fast response time?

    Complete your task and record the time. If you ban users for one fast response, then set a minimal time. If you do it after several fast responses, increase the time slightly.

    Control tasks
    Specify the following values:

    A performer who gives more than 40% of incorrect responses will be blocked and won't be able to complete tasks in this project for 10 days.

    Additionally, configure:

    Captcha
    Example of the rule configuration.

    A performer who entered a captcha at least 5 times and the percentage of correct answers is less than 60% is banned and can't complete your tasks for 10 days.

    Majority vote
    Examples of the Majority vote rule configuration. Choose appropriate actions and parameters.
  6. Configure normal or dynamic overlap:

    • Overlap is the number of performers to complete the same task. For content moderation tasks, 3-5 is an appropriate value. In this case, it makes sense to use Aggregation of results to check the reliability of responses.
    • Dynamic overlap (incremental relabeling, IRL). It will help you optimize your budget for getting the most reliable responses. Example of settings.

      For this parameter to work, you need to load tasks using Smart mixing.

  7. In the Speed/quality ratio block, you can leave the settings unchanged.
  8. Set the Time allowed for completing a task suite. It should be long enough to read the guidelines and wait for task data to download For example, 150 seconds.
  9. Save the pool.

Uploading tasks

Download the sample upload file. You can find it on the pool page. At the top-left of the page, there are links to TSV files with regular, control, and training tasks. Use it to prepare your own file with tasks.

  1. Click Upload. In the window that opens, you can also download a sample TSV file by clicking Sample file for uploading tasks.
    What is TSV?
    A TSV file is a spreadsheet in the form of a text file with columns separated by tab.
    Work with it in a text or spreadsheet editor, save it to the desired format. More about working with a TSV file. There is a CSV format that is similar to TSV, but you should use a TSV file for uploading.
    Note. The file must be saved in UTF-8 encoding.
  2. Add input data in it. The header of the input data column contains the INPUT word. Add the comments you want to check in it. Leave the other columns empty.

    This is the beginning of the file with the tasks for checking comments:

  3. Load the tasks and choose Smart mixing.
    What is smart mixing?
    The task distribution logic that places tasks of different types on the same page. For example, one control task per three main tasks. If you have a lot of comments, set one control task per 9-10 regular comments on the page.
  4. Mark up control tasks.
    • Click Mark up → Create control tasks.
      Note. If you selected something else instead of smart mixing, click Edit. If this button is missing, delete the file and upload it again.
    • Add the correct responses in the control tasks. There should be as many as you set in the settings above.

    • Go back to the pool or project page from the menu bar at the top. The uploaded and marked up tasks will be saved.

Adding a training

A training pool is a set of unpaid training tasks where the performer learns to answer correctly. Training tasks contain correct responses and a hint shown if the performer answers incorrectly.

Tip. Write clear instructions. Criteria for good and bad comments on different resources differ, and you need to explain to the performers what and how they should check in the tasks.
  1. Open the project page, go to the Training tab and click Add training.

  2. Give a name to the training pool and set the time for completing a task suite.
  3. Save the pool by clicking Create training.
  4. Get the Sample upload file or edit the one you used for uploading the main pool tasks.
    Note. TSV files for all project pools have the same structure.
  5. Add comments to include in training in the TSV file.
  6. Upload the file and specify the number of tasks on the page. For example, 2. This number must not exceed the number of tasks per page in the linked pool.
  7. Click Download and enter the number of training tasks on the page.
  8. Click Add.
  9. Click Mark up → Create training tasks. Next, add correct answers and hints for all the uploaded tasks. Why do I need markup?.
  10. After the file is uploaded, open the Preview and check that the tasks are displayed correctly.
  11. Link training.

    • Open the main pool.
    • Click Edit.
    • Choose the name of the training you just created.
  12. Set the Level required to 70. This means that the main pool will be available for users who made no more than 30% of mistakes in the training pool.

  13. Click Save.

Learn more about creating a pool with training.

Starting a pool and getting results

  1. Start the training pool first, then the regular pool. From the pool page, you can do this by clicking . From the project page — click next to the pool name.
  2. Track the completion of tasks in the Pool statistics section. If you created a project in the sandbox, you can test it by yourself.
  3. When the pool is fully completed, start aggregation of results. Next to the Download results button, click  → Dawid-Skene aggregation model.

    In the TSV file with aggregated responses, you can see the response significance in percentage in the CONFIDENCE field. It helps you understand how reliable the comment evaluation is. Learn more about aggregation.

  4. Track the aggregation progress on the Operations page (next to the Download results button: ). When the process is completed, click Download.

Troubleshooting

How do I show two different versions of the text to performers?

If you pass texts to the input data, you can upload 2 different tasks to the pool: pass Text 1 in the INPUT: <input field name> field of Task 1. In Task 2, use this field to pass Text 2.

If the text is in the HTML block of the task template, then clone the project. To limit a performer to doing only one task in your project, use the Submitted responses rule. You can assign a skill or ban the performer after they submit one response.

I can't upload files from Yandex.Disk

If images, audio or video files from Yandex.Disk don't appear in the instructions or on the task suite, make sure you connected Yandex.Disk correctly and uploaded the files.

How to create a task where the performer has to view a video from Yandex.Disk

To create such a task, take the video markup template as a basis.

To host your videos on Yandex.Disk, connect Yandex.Disk and set up the project.

Why can't my task for selecting objects in an image display images from Yandex.Disk?
The problem is in your task template. Make sure that:
  • In the project, the input field where you pass the file link has the “string” type.
  • The component in the task template uses the "proxy" expression.
  • The format of relative links in the TSV file with tasks is correct: <unique name>/<file path and name>.
For detailed instructions and videos, see the page Using files from Yandex.Disk.
Frequent mistakes when connecting to Yandex.Disk and uploading files
  • The Input data field in the project settings has the link type. You should choose the string type.
  • The TSV file contains absolute references to the task files. You need to insert a link <unique name>/<path and file name>. For example: yadisk/image1.jpg or yadisk/photos/image1.png.
  • Photos from Yandex.Disk are used in the task instructions in the mobile app. To display the photos in the instructions, use only direct links.
  • Files are deleted or aren't located in the Yandex.Disk folder that the link leads to.
  • The OAuth token isn't active. Update the token on the External Services Integration page.
To display files from Yandex.Disk (images, audio files, videos) to the performer:
  1. Link Yandex.Disk in your profile.
  2. Set the string type for the input data field.
  3. Insert a file link using the proxy component.

Detailed instructions

Files load too slowly from Yandex.Disk. How do I speed up the loading process?

Try the recommendations on this page or contact Yandex.Disk support.