Image classification

  1. Create a project
  2. Add a task pool
  3. Upload tasks
  4. Set up quality control
  5. Add training
  6. Start the pool and get the results

Projects of the classification type are intended for tasks with multiple choice. Examples are moderating content or grouping images by category.

You may need additional projects for your task, such as dataset pre-check or checking performers' responses. Learn more about this in Designing the solution architecture.

Suppose you have a set of cat photos and want them split into several groups according to the cat's mood. You should create a task where a performer sees a photo and has to choose one of three responses. The performer can also mark if they like a photo.
Tip.

Run the project in the Sandbox first. This helps you avoid making mistakes and spending money on a task that isn't working right.

Example of a prepared task

To run tasks and get responses:

Create a project

The project defines what the task will look like for a performer.

  1. Choose a template:

    1. Click + Create project.

    2. Choose the Image classification template.
  2. Provide general information:

    1. Enter a clear name and a short description for the project. Performers will see this in the task list.

    2. Optionally add a Private comment.
    3. Click Save.
  3. Edit the task interface:

    Note. This tutorial shows how to create a task interface in the HTML/JS/CSS editor. You can also try creating a task interface in the Template Builder.
    1. Define which objects you are going to pass to the performers and receive from them in response. To do this, add input and output fields in the Data specification section.

      What are input and output data?

      Input data is types of objects that are passed to the performer for completing the task. For example, this could be a text, an image, or geographic coordinates.

      Output data is types of objects that you receive after the task is completed. For example, this could be one of several response options, typed text, or an uploaded file.

      Learn more about input and output data fields.

      In this case they are:

      • The image input data field for a link to an image.
      • Output data fields:
        • Boolean like for a checkbox answer.
        • The result string with a radio button response.

      Create the task interface in the HTML block. It describes how the task elements should be arranged in the task.

      You can use standard HTML tags and special expressions in double curly brackets for input and output data fields.

      {{img src=image width="100%" height="400px"}}
      
      {{field type="radio" name="result" value="OK" label="Good" hotkey="1"}}
      {{field type="radio" name="result" value="BAD" label="Bad" hotkey="2"}}
      {{field type="radio" name="result" value="404" label="Loading error" hotkey="3"}}
      <br>
      {{field type="checkbox" name="like" label="Do you like the photo?" hotkey="q"}}

      This notation describes the following task design:

      • A picture at the image link.
      • Three radio buttons, and the chosen option is output to the result field.
      • A checkbox, with the value (true or false) output to the like field.

      Leave the CSS and JavaScript blocks unchanged.

    2. Click to see the performer's view of the task.

      Note. The project preview shows one task with standard data. You can define the number of tasks to show on the page later.
    3. Click Save.
  4. Write instructions for performers:

    1. Write short and clear guidelines (see the recommendations). Describe what needs to be done and give examples in them.

      You can prepare instructions in HTML format, then copy and paste into the editor. Click <> to switch to HTML mode.

    2. Click Finish.

Add a task pool

A pool is a set of paid tasks sent out for completion at the same time.

  1. Open the project and click Add pool.
  2. Give the pool any convenient name. It is available only to you, the performer will only see the name of the project.
  3. Set the price per task suite (for instance, $0.02).
    What is a task suite?

    A page can contain one or several tasks. If the tasks are simple, you can add 10-20 tasks per page. Don't make pages too long because it slows down loading speed for performers.

    Performers get paid for completing the whole page.

    The number of tasks on the page is set when uploading tasks.

    What is the fair price for a task suite?

    The general rule of pricing is the more time the performer spends to complete the task, the higher the price is.

    You can register in Toloka as a performer and find out how much other requesters pay for tasks, or see examples of cost for different types of tasks.

  4. Set the Time allowed for completing a task suite. It should be long enough to read the guidelines and wait for task data to download (for example, 600 seconds).
  5. Set Overlap, which is the number of performers to complete the same task. For classification tasks, 3 is enough.
  6. Add Filters to select performers. To make your task available only to English-speaking users, set filters by language and country detected by the phone number.
  7. Save the pool.

Upload tasks

Prepare your own task file. Check out the example in a demo TSV file. You can find it on the pool page. At the top-left of the page, there are links to TSV files with regular, control, and training tasks.

  1. Click Upload. In the window that opens, you can also download a sample TSV file by clicking Sample file for uploading tasks.

    What is TSV?
    A TSV file presents a table as a text file in which columns are separated by tabs.
    You can work with it both in a table editor and a text editor, and then save it to the desired format. More about working with a TSV file. There is a CSV format that is similar to TSV, but you should use a TSV file for uploading.
    Note. Before uploading the file, make sure it is saved in UTF-8 encoding.
  2. Add input data in it. The header of the input data column contains the word INPUT. Leave the other columns empty.
  3. Upload the tasks using Smart mixing and enter the number of tasks per page. For example: 9 main tasks and 1 control task.
    What is smart mixing?
    Smart mixing randomly generates pages with tasks so that tasks are not repeated for each performer.
  4. Add control tasks. To do this, click the Edit button and give the correct responses for several tasks.
    Note.

    If you selected something else instead of smart mixing, click Edit. If this button is missing, delete the file and upload it again.

    What are the control tasks?

    Control tasks are tasks with the correct response known in advance. They are used to track the performer's quality of responses. The response you provided is compared to the performer's response. If they match, it means the performer answered correctly.

    Control tasks should make up at least 1% of the total number of tasks. This means that for 1000 tasks you should add at least 20 control tasks.

    More about control tasks.

Set up quality control

Quality control rules allow you to filter out inattentive performers. You can configure quality control both in the project and in the pool.

Attention.

Quality control settings are applied to all project pools, so you can't change them in just one of the pools.

When you clone a project, its quality control settings aren't transferred.

    Go to pool editing (the Edit button in the upper-right corner of the page) and click Add Quality Control Rule.

    You can copy quality control settings from another pool. To do this, click Copy settings from in the Users filter section.

  1. Add the Control tasks section and specify the following values:

    This means that a performer who gives more than 40% of incorrect responses will be blocked for five days and won't be able to complete tasks in this project.

  2. Add a restriction for Fast responses.

    The Minimum time per page value depends on the number of tasks on this page. It takes 2-4 seconds to identify the cat's mood. This means that a page with 10 tasks may take 20-30 seconds to complete.

    A performer can make an accidental mistake once in a while, but after 2-3 repeated mistakes you can ban the performer for a while.

    Specify the following values:

    This means that a user who completes two task suites in less than 20 seconds will be blocked for 10 days and won't be able to complete your tasks.

Add training

Create a training pool:

  1. Open the project page.

  2. Go to the Training tab.

  3. Click the Add training button.

  4. Fill in the training settings fields.

    You can use the Retry after field to set up repeated training.
  5. Click Save training.
After you create a training pool:
  1. Get the task template (TSV) or edit the one you used for uploading the main pool tasks.
    Note. TSV files for all project pools have the same structure.
  2. Add links to images for the training tasks in the TSV file.
  3. Upload the file and specify the number of tasks on the page. For example, 10. This number must not exceed the number of tasks per page in the main pool.
  4. Click Download and enter the number of training tasks on the page.
  5. Click Add.
  6. Click Mark upand then Create training tasks. Next, add correct answers and hints for all the uploaded tasks.
  7. After the file is uploaded, open the Preview and check that the tasks are displayed correctly.
  8. Open the main pool with tasks, link Training to it and set the Level required to 55. This means that the main pool will be available for users who made no more than 45% of mistakes in the training pool.

    To link the training pool, go to the main pool editing mode and select your training pool in the Training parameter drop-down list.

Learn more about creating a pool with training.

Start the pool and get the results

  1. Start the pool by clicking .
  2. Track the completion of tasks in the Pool statistics section.
  3. When the pool is completed, launch aggregation of results. To do this, find the Download results button and click  → Dawid-Skene aggregation model next to it.

    Aggregation of responses is necessary to get a complete picture of all results. Learn more about aggregation.

  4. Track the aggregation progress on the Operations page. When the process is completed, click Download.

Troubleshooting

Uploading files from Yandex.Disk
I can't upload files from Yandex.Disk

If images, audio or video from Yandex.Disk don't appear in the instructions or on the task suite, make sure you connected Yandex.Disk correctly and uploaded the files.

How to create a task where the performer has to view a video from Yandex.Disk

To create such a task, take the video markup template as a basis.

To host your videos on Yandex.Disk, connect Yandex.Disk and set up the project.

Why can't my task for selecting objects in an image display images from Yandex.Disk?
The problem is in your task template. Make sure that:
  • In the project, the input field where you pass the file link has the “string” type.
  • The component in the task template uses the "proxy" expression.
  • The format of relative links in the TSV file with tasks is correct: <unique name>/<file path and name>.
For detailed instructions and videos, see the page Using files from Yandex.Disk.
Frequent mistakes when connecting to Yandex.Disk and uploading files
  • The Input data field in the project settings has the link type. You should choose the string type.
  • The TSV file contains absolute references to the task files. You need to insert a link <unique name>/<path and file name>. For example: yadisk/image1.jpg or yadisk/photos/image1.png.
  • Photos from Yandex.Disk are used in the task instructions in the mobile app. To display the photos in the instructions, use only direct links.
  • Files are deleted or aren't located in the Yandex.Disk folder that the link leads to.
  • The OAuth token isn't active. Update the token on the External Services Integration page.
To display files from Yandex.Disk (images, audio files, videos) to the performer:
  1. Link Yandex.Disk in your profile.
  2. Set the string type for the input data field.
  3. Insert a file link using the proxy component.

Detailed instructions

Files load too slowly from Yandex.Disk. How do I speed up the loading process?

Try the recommendations on this page or contact Yandex.Disk support.

Frequent mistakes when connecting to Yandex.Disk and uploading files
  • The Input data field in the project settings has the link type. You should choose the string type.
  • The TSV file contains absolute references to the task files. You need to insert a link <unique name>/<path and file name>. For example: yadisk/image1.jpg or yadisk/photos/image1.png.
  • Photos from Yandex.Disk are used in the task instructions in the mobile app. To display the photos in the instructions, use only direct links.
  • Files are deleted or aren't located in the Yandex.Disk folder that the link leads to.
  • The OAuth token isn't active. Update the token on the External Services Integration page.
To display files from Yandex.Disk (images, audio files, videos) to the performer:
  1. Link Yandex.Disk in your profile.
  2. Set the string type for the input data field.
  3. Insert a file link using the proxy component.

Detailed instructions

Why doesn't the task preview show my images from Yandex.Disk?
The problem is in your task template. Make sure that:
  • In the project, the input field where you pass the file link has the “string” type.
  • The component in the task template uses the "proxy" expression.
  • The format of relative links in the TSV file with tasks is correct: <unique name>/<file path and name>.

Detailed instructions.

How do I embed multiple images using links to Yandex.Disk?
To add images using links to Yandex.Disk:
  1. Use a link, for example: /api/proxy/yadisk/image1.jpg .
  2. In the requester profile settings, go to External Services Integration → Proxy settings.
  3. Set up integration with external services.

    Learn more about using files from Yandex.Disk.

Why do I see a syntax error when I upload a task where a user has to view an image and write feedback?

The error might occur if the expected input type is URL, but a string is received.

There may be two reasons:
  • The input field has the "link" type.
  • The pool was created for an outdated project version. It means that the pool was created before you changed the input field type.
I have a task for photo classification. When there are more than 5 photos on the page, why does Toloka split them across 2 pages?

Toloka will split the links to images in the uploaded file into pages depending on the method you specified when uploading the TSV file. For more information about the three upload methods, see the Guide.

How do I make an image expand to its maximum size on click?

To the component that inserts the image, add the parameters: real-size=true and screenshot=true.

Do I need to convert all the images in the task to the same size or can they be different?
You can use different image sizes.