Uploading tasks to a pool

Restriction. You can add up to one million tasks to the pool. To upload more tasks, create another pool.
To upload a TSV file with tasks to a pool:
  1. On the pool page, click Upload.
  2. Choose how to divide tasks into task suites (pages to show users). Currently, there are three methods to place tasks: By empty row, Set manually, and Smart mixing.
    How to distribute tasks as suites
    Characteristics/upload type By empty row and Set manually By empty row and Set manually (keep task order) Smart mixing Smart mixing (keep task order)
    To generate task suites, tasks are taken in the order of rows (from top to bottom) in an uploaded file Yes Yes No Yes
    Tasks are mixed within a suite No No Yes Yes
    Task suites are distributed to performers in the same order No Yes Yes Yes
    Within identical task suites, control tasks are the same for all performers Yes Yes No Yes

    To learn more about how to group tasks in suites, see below.

    Ways to group tasks in suites
    By empty row

    Divide the tasks into suites yourself in the TSV file. To do this, add an empty line after each task suite in the file.

    Set manually
    Enter the number of tasks per suite. Task suites are formed from the tasks in the order they are placed in the TSV file.
    Smart mixing

    Specify how many tasks of each type should be in each task suite. For example, 8 main tasks, 1 training and 1 control task. If necessary, specify the minimum number of tasks for each type in additional settings. If there aren't enough main tasks and the Assign partial page option is set, the performer is given an incomplete task suite. Please note that the number of control and training tasks in this case must be complete.

    Attention. If you upload a file via “Smart mixing”, you won't be able to use other ways of task distribution on the pages in this pool.

    This method is useful if the created pool:

    Examples


    Smart mixing and keeping the task order
    • Tasks are divided into lists by task type: regular, control, or training.
    • Task suites are generated using these lists. The number of tasks of the given type that you specified in the settings is added from each list. By default, tasks are randomly selected.

      If the Keep task order option is enabled, tasks are added in the same order as they were listed in the source TSV file. This takes into account the overlap: the task that goes first will be assigned until it reaches the desired overlap.

    • Tasks in the suite are mixed up when the page is shown to the performer.
    Smart mixing without "Keep task order"
    Example
    Smart mixing + "Keep task order"
    Example

    After uploading the tasks with smart mixing you will be able to mark up tasks and set selective majority vote checking.

    Setting overlap

    If you upload tasks from the Toloka interface, infinite overlap is set automatically for control and training tasks, so that there is enough to mark up all main tasks.

    You can set the overlap via the Toloka API.

    If you used Set manually, you can find out the number of tasks per suite in the pool settings. But some suites may be incomplete. If you uploaded tasks in a different way, you can check how they're grouped into suites in the Toloka interface for requesters. To do this, on the pool page, click filesDownload all tasks. You can also check task distribution across suites using the Toloka API.

    Note. Set the number of tasks per suite depending on the complexity and time allocated for a task. We recommend that you distribute them so that each task suite takes no more than five minutes to complete. The performers are paid for completing a full task suite. The amount they get is specified in the pool settings.
  3. Click the Upload file button and choose the file. To put different types of tasks in a pool, you can upload them in separate files. You can also add tasks to existing ones as a separate file. Please note that this upload option will only work if Smart mixing is set. For example, if you selected Set manually, after uploading a file with main tasks and then a file with control tasks, you'll get separate pages with these types of tasks.
  4. Wait for the result. If you get a processing error, it means that the data file is not formatted correctly. For example, there are unnecessary tabs in the file or some lines, headers, or quotes are missing. In this case, click Cancel, correct the mistakes, and then upload the file again.
  5. Click Add.

  6. View the result by clicking the Preview button.

To delete all the tasks in the pool, click Delete.

How do I save the task order?

Keeping the task order while ignoring overlap

If you need the performers to receive task suite in the same order as they are in the uploaded TSV file, set it up with the Keep task order option. The Keep task order option works differently depending on the method for distributing tasks across suites. If the by empty row and set manually methods are used, performers will get task suites one after another: page 1 first, then pages 2 and 3, and so on. Tasks within suites will also go one after another and all performers will see the same sequence. For smart mixing, the algorithm generates suites so that performers get tasks in the order they are listed in the TSV file. Note that only task suites will be distributed in order, while the tasks within the suites will be mixed on the page.

To use this option in your project, turn on the Keep task order option in the Parameters settings when creating a new pool.

Note. Keeping the order of tasks is useful if you need to quickly reach the overlap to monitor the majority vote or maintain the sequence of questions in a survey or training.
  • By default, this option is disabled (set to No). In this case, both the task suites and the tasks inside the suites will be given to the performers in random order.

    For example, if you upload 20 tasks in the TSV file to the pool (in order from the 1st to the 20th) and set four tasks per suite, the tasks will be distributed to the performers in the following way:

    Performers Task suite number Order of tasks on the page:
    1 1 3, 2, 4, 1
    2 5 17, 20, 18, 19
    1 3 12, 9, 11, 10
    3 2 7, 8, 6, 5
    2 4 16, 13, 15, 14
    3 3 11, 12, 10, 9
    ... ... ...
    Example


  • If the option is enabled (set to Yes), the tasks are given to the performer page by page in the same order as they are in the TSV file. The tasks within the page are shuffled.

    For example, like in the previous case, tasks are loaded in the pool in order (from the 1st to the 20th), four tasks per suite. But in this case, the performers will get suites in the same order as in the upload file, with tasks shuffled inside each suite on the page:

    Performers Task suite number Order of tasks on the page:
    1 1 1, 4, 3, 2
    2 1 3, 4, 1, 2
    1 2 6, 5, 7, 8
    3 1 2, 1, 4, 3
    2 2 8, 5, 7, 6
    3 2 5, 8, 6, 7
    ... ... ...
Note. In the pool preview, suites and tasks are shuffled because the task order isn't preserved in the preview. However, when you start the pool, task suites will be issued to each performer in the specified order.
Task order accounting for overlap

If you set an overlap to more than one and turn on the Keep task order option, each subsequent task suite is distributed to interested users only after there are enough users who submitted the suite that was already assigned (in other words, after it reaches full overlap).

In this case, if the user already completed one task suite from the pool or there is a new interested user, they will get the next suite that isn't in progress yet, even if the previous one didn't reach full overlap.

If a user refuses the issued task suite, it will be given to another user — either someone else who is interested in the pool, or an available user who accepts the task.

For example, if overlap is set to 3:

Performers Task suite number The overlap value achieved Note
1 1 1 Interested users received page 1
2 1 2
1 2 1 A performer completed task suite 1 and got task suite 2, although task suite 1 didn't reach full overlap
3 1 3 Full overlap of task suite 1
3 2 1 The user who took the task refused to complete task suite 2
4 2 2 The interested user received task suite 2 right away because there is already a full overlap for task suite 1, and the user who took it refused to perform task suite 2
1 3 1 A performer completed task suite 2 and got task suite 3, although suite 2 didn't reach full overlap
2 2 3 Full overlap of task suite 2
5 3 1 Interested user refused to complete task suite 3
2 3 2 The user who accepted it received task suite 3 because the interested user refused to complete it
3 3 3 Full overlap of task suite 3
... ... ... ...

You can also set the order of tasks in the Toloka API. To do this, use the function shuffle_tasks_in_task_suite: If true, the task order within a suite is random. If false, the order in which tasks were uploaded is kept. The default is true, meaning that tasks are shuffled within the suite.

Skill

If you added the majority vote quality control rule, when all completed task suites have reached full overlap, the performer will be assigned a skill using majority vote. For example, if overlap 3 is set in the pool settings, the skill is calculated after each of these suites reaches overlap 3, not after the performer completes 3 suites.

Troubleshooting

Pool settings
How many tasks should be in a suite?

The number of tasks depends on how difficult and time-consuming the tasks are. Keep the size reasonably small. Large task suites are unpopular, partly because they are inconvenient for performers (for example, if the internet connection is unstable).

Errors when uploading tasks in the pool
How do I view the processing log?
To view the processing log, click More on uploading errors. The processing log is written in JSON format. Objects inside result match the line number of the uploaded file. Lines that were processed with an error have the status "success": false.
Tip. To work with a large log conveniently, copy it to the text editor.
Errors in column headers

If the column headings are incorrect, the whole file is rejected. Otherwise, Toloka specifies the number of tasks with processing errors.

Processing errors table
Overview How to fix
"parsing_error_of": "https://tlk.s3.yandex.net/wsdm2020/photos/2d5f63a3184919ce7e3e7068cf93da4b.jpg\t\t",
"exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 1, sourceList size = 3)"

Extra tabs.

If the TSV file contains more \t column separators after the data or the link than the number of columns set in the input data, you will get en error message.

For example, if 1 column is defined in the input, and two more \t\t tabs are added in the TSV file after the link, you get 3 columns, 2 of which are extra.

Remove extra column separators in the above example — both \t\t characters.

"exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 4, sourceList size = 6)"

The number of fields in the header and in the row doesn't match.

Make sure that:

  • The number of tabs in the file structure is correct.
  • String values with tab characters are enclosed in quotation marks " ".
"code": "VALUE_REQUIRED", "message": "Value must be present and not equal to null"
The value is missing for a required input field.

Make sure that columns with required input data fields are filled.

"code": "INVALID_URL_SYNTAX", "message": "Value must be in valid url format"
Invalid data in a “link” (“url”) field.
Make sure that:
"exception_msg": "unexpected end of file while reading quoted column beginning on line 2 and ending on line 4"

Unpaired quotation mark in a string.

Check that all quotation marks are escaped.

How do I know how many tasks a performer will see on the page?

You can specify the number of tasks on the page when you upload your tasks to the pool. For more information about distributing tasks across pages, see this article.

How do I create the task file properly so that there are no errors?

In the file with the main tasks, the columns with the INPUT headers must be filled out. You can see those headers if you download a sample file from the pool.

If you are creating control tasks, fill out the GOLDEN columns with the correct responses.

If you are creating a training task, you also need to fill in the HINT:text column. For the main tasks you don't need any columns other than INPUT, so feel free to delete them.

The file format must be TSV, and the encoding must be UTF-8.

For more information about creating the file, see the Guide. If there are errors during the upload, look up the error description on this page.

What is the maximum number of tasks per page?

It depends on the task. Technically, you can use as many tasks you want.

But users are reluctant to take lengthy tasks. They'd rather do 10 tasks that take one minute each than one task that takes 10 minutes.

In addition, if you use a large number of tasks on the page, there might be issues with uploading the files to be labeled. This problem might occur with images.

The third thing to consider is quality control and assignment review. If you use recompletion of assignments from banned users, you should split the task into smaller parts so that fewer assignments are recompleted. You are more likely to meet your budget this way.

The same task appeared on different pages

The same task may appear on different pages if:

  • Dynamic overlap is used (incremental relabeling, IRL). As an example, let's say there were 5 tasks on a page. For 4 of them, responses coincided and the common response was counted as correct. The fifth task was mixed into another set because it didn't get into the final response and it needs to be “reassessed”.
  • Different tasks have different overlap. Tasks with higher overlap will be additionally shown in sets with the other remaining tasks in the pool.
  • If a quality control rule changes a task's overlap, it will appear in a different set.
Why haven't I received assignments since I launched my first project, and all the uploaded assignments are marked as "Training"?

Check the hint field. For the main tasks, this field must be empty.

Why do I see a syntax error when I upload a task where a user has to view an image and write feedback?

The error might occur if the expected input type is URL, but a string is received.

There may be two reasons:
  • The input field has the "link" type.
  • The pool was created for an outdated project version. It means that the pool was created before you changed the input field type.
How do I specify smart mixing settings in the interface when uploading a file?

Smart mixing settings are specified for the file rather than for the pool.

The settings specified during the first file upload are applied to all the files that are uploaded to this pool later on.

What is the right time limit for the task completion?
Try completing the tasks yourself. Ask your colleagues and friends to complete them. Find out average completion time and add 50% to it.
What is the difference between "task" and "task_suite"?

A task means a separate task. A task suite means a page with tasks. The performer gets paid for a task suite.

How do I upload the file with the accepted assignments back to Toloka for projects with non-automatic acceptance? Where do I find the format of the upload data?

Use the button Upload review results to upload your file. You can see the format here.

Assignments are reviewed in a TSV file.