Uploading tasks to a pool
- On the pool page, click Upload.
- Choose how tasks are placed on the page. Currently, there are three methods to place tasks: By empty row, Set manually, and Smart mixing.
How to distribute tasks by page Characteristics/upload type By empty row and Set manually By empty row and Set manually (keep task order) Smart mixing Smart mixing (keep task order) To generate pages, tasks are taken in the order of rows (from top to bottom) in an uploaded file Yes Yes No Yes Tasks are mixed within a page No No Yes Yes Pages are distributed to performers in the same order No Yes Yes Yes Within identical pages, control tasks are the same for all performers Yes Yes No Yes How to distribute tasks by page Characteristics/upload type By empty row and Set manually By empty row and Set manually (keep task order) Smart mixing Smart mixing (keep task order) To generate pages, tasks are taken in the order of rows (from top to bottom) in an uploaded file Yes Yes No Yes Tasks are mixed within a page No No Yes Yes Pages are distributed to performers in the same order No Yes Yes Yes Within identical pages, control tasks are the same for all performers Yes Yes No Yes For more information about how to distribute tasks by page, see below.
- By empty row
-
Divide the tasks into pages yourself in the TSV file. To do this, add an empty line after each task page in the file.
- Set manually
- Enter the number of tasks per page. Task pages are formed from the tasks in the order they are placed in the TSV file.
- Smart mixing
-
Specify how many tasks of each type should be on the page. For example, 8 main tasks, 1 training and 1 control task. If necessary, specify the minimum number of tasks for each type in additional settings. If there aren't enough main tasks and the Assign partial page option is set, the performer is given an incomplete page. Please note that the number of control and training tasks in this case must be complete.
Attention. If you upload a file via “Smart mixing”, you won't be able to use other ways of task distribution on the pages in this pool.
Ways to distribute tasks by pageIf you used Set manually, you can find out the number of tasks per page in the pool settings. But some pages may be incomplete. If you uploaded tasks in a different way, you can check how they're distributed by page in the Yandex.Toloka interface for requesters. To do this, on the pool page, click files → Download all tasks. You can also check task distribution by page using the Yandex.Toloka API.
Note. Set the number of tasks on the page depending on the complexity and time allocated for a task. We recommend that you distribute them so that each task page takes no more than five minutes to complete. The performers are paid for completing a full task page. The amount they get is specified in the pool settings. - Click the Upload file button and choose the file. To put different types of tasks in a pool, you can upload them in separate files. You can also add tasks to existing ones as a separate file. Please note that this upload option will only work if Smart mixing is set. For example, if you selected Set manually, after uploading a file with main tasks and then a file with control tasks, you'll get separate pages with these types of tasks.
- Wait for the result. If you get a processing error, it means that the data file is not formatted correctly. For example, there are unnecessary tabs in the file or some lines, headers, or quotes are missing. In this case, click Cancel, correct the mistakes, and then upload the file again.
Click Add.
View the result by clicking the Preview button.
To delete all the tasks in the pool, click Delete.
How do I save the task order?
By default, this option is disabled (set to No). In this case, both the task pages and the tasks inside the pages will be given to the performers in random order.
For example, if you upload 20 tasks in the TSV file to the pool (in order from the 1st to the 20th) and set four tasks per page, the tasks will be distributed to the performers in the following way:
Performers Task page number Order of tasks on the page: 1 1 3, 2, 4, 1 2 5 17, 20, 18, 19 1 3 12, 9, 11, 10 3 2 7, 8, 6, 5 2 4 16, 13, 15, 14 3 3 11, 12, 10, 9 ... ... ... Performers Task page number Order of tasks on the page: 1 1 3, 2, 4, 1 2 5 17, 20, 18, 19 1 3 12, 9, 11, 10 3 2 7, 8, 6, 5 2 4 16, 13, 15, 14 3 3 11, 12, 10, 9 ... ... ... Example
If the option is enabled (set to Yes), the tasks are given to the performer page by page in the same order as they are in the TSV file. The tasks within the page are shuffled.
For example, like in the previous case, tasks are loaded in the pool in order (from the 1st to the 20th), four tasks per page. But in this case, the performers will get pages in the same order as in the upload file, with tasks shuffled inside each page:
Performers Task page number Order of tasks on the page: 1 1 1, 4, 3, 2 2 1 3, 4, 1, 2 1 2 6, 5, 7, 8 3 1 2, 1, 4, 3 2 2 8, 5, 7, 6 3 2 5, 8, 6, 7 ... ... ... Performers Task page number Order of tasks on the page: 1 1 1, 4, 3, 2 2 1 3, 4, 1, 2 1 2 6, 5, 7, 8 3 1 2, 1, 4, 3 2 2 8, 5, 7, 6 3 2 5, 8, 6, 7 ... ... ...
If you need the performers to receive task pages in the same order as they are in the uploaded TSV file, set it up with the Keep task order option. The Keep task order option works differently depending on the method for distributing tasks on pages. If the by empty row and set manually methods are used, performers will get task pages one after another: page 1 first, then pages 2 and 3, and so on. Tasks within pages will also go one after another and all performers will see the same sequence. For smart mixing, the algorithm generates pages so that performers get tasks in the order they are listed in the TSV file. Note that only task pages will be distributed in order, while the tasks within the pages will be mixed.
To use this option in your project, turn on the Keep task order option in the Parameters settings when creating a new pool.
- Skill
-
If you added the majority vote quality control rule, once all completed pages have reached full overlap, a performer will be assigned a skill by majority vote. For example, if overlap 3 is set in the pool settings, the skill is calculated after each of these pages reaches overlap 3, not after the performer completes 3 pages.
If you set an overlap to more than one and turn on the Keep task order option, each subsequent page is distributed to interested users only after there are enough users who submitted the page that was already assigned (in other words, after it reaches full overlap).
In this case, if the user already completed one pool page or there is a new interested user, they will get the next page that isn't in progress yet, even if the previous one didn't reach full overlap.
If a user refuses the issued task page, it will be given to another user — either someone else who is interested in the pool, or an available user who accepts the task.
For example, if overlap is set to 3:
Performers | Task page number | The overlap value achieved | Note |
---|---|---|---|
1 | 1 | 1 | Interested users received page 1 |
2 | 1 | 2 | |
1 | 2 | 1 | A performer completed page 1 and got page 2, although page 1 didn't reach full overlap yet |
3 | 1 | 3 | Full overlap of page 1 |
3 | 2 | 1 | The user who took the task refused to complete page 2 |
4 | 2 | 2 | The interested user received page 2 straight away, since there is already a full overlap for page 1, and the user who took it refused to perform page 2 |
1 | 3 | 1 | A performer completed page 2 and got page 2, although page 2 didn't reach full overlap yet |
2 | 2 | 3 | Full overlap of page 2 |
5 | 3 | 1 | Interested user refused to complete page 3 |
2 | 3 | 2 | The user who submitted to the pool before received page 3, since the interested user refused to complete it |
3 | 3 | 3 | Full overlap of page 3 |
... | ... | ... | ... |
Performers | Task page number | The overlap value achieved | Note |
---|---|---|---|
1 | 1 | 1 | Interested users received page 1 |
2 | 1 | 2 | |
1 | 2 | 1 | A performer completed page 1 and got page 2, although page 1 didn't reach full overlap yet |
3 | 1 | 3 | Full overlap of page 1 |
3 | 2 | 1 | The user who took the task refused to complete page 2 |
4 | 2 | 2 | The interested user received page 2 straight away, since there is already a full overlap for page 1, and the user who took it refused to perform page 2 |
1 | 3 | 1 | A performer completed page 2 and got page 2, although page 2 didn't reach full overlap yet |
2 | 2 | 3 | Full overlap of page 2 |
5 | 3 | 1 | Interested user refused to complete page 3 |
2 | 3 | 2 | The user who submitted to the pool before received page 3, since the interested user refused to complete it |
3 | 3 | 3 | Full overlap of page 3 |
... | ... | ... | ... |
You can also set the order of tasks in the Yandex.Toloka API. To do this, use the function shuffle_tasks_in_task_suite:
If true
, the task order within a page is random. If false
, the order in which tasks were uploaded is kept. The default is true
, meaning that tasks are shuffled within the page.
What's next
- If you haven't yet marked up control and training tasks in the TSV file, mark up the tasks in the interface.
Troubleshooting
The number of tasks depends on how difficult and time-consuming the tasks are. Don't make task pages too large. They are unpopular, partly because they are inconvenient for performers (for example, if the internet connection is unstable).
- Errors in column headers
-
If the column headings are incorrect, the whole file is rejected. Otherwise, Toloka specifies the number of tasks with processing errors.
- Processing errors table
-
Description How to fix "parsing_error_of": "https://tlk.s3.yandex.net/wsdm2020/photos/2d5f63a3184919ce7e3e7068cf93da4b.jpg\t\t", "exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 1, sourceList size = 3)"
Extra tabs
If the TSV file contains more
\t
column separators after the data or the link than the number of columns set in the input data, you will get en error message.For example, if 1 column is defined in the input, and two more
\t\t
tabs are added in the TSV file after the link, you get 3 columns, 2 of which are extra.Remove extra column separators in the above example — both
\t\t
characters."exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 4, sourceList size = 6)"
The number of fields in the header and in the row doesn't match.
Make sure that:
- The number of tabs in the file structure is correct.
- String values with tab characters are enclosed in quotation marks
" "
.
"code": "VALUE_REQUIRED", "message": "Value must be present and not equal to null"
The value for a required input field is not specified. Make sure that columns with required input data fields are filled.
"code": "INVALID_URL_SYNTAX", "message": "Value must be in valid url format"
Invalid data in the URL field. Make sure that:- Links start with the
http://
,https://
orwww
prefix. - When you upload the file from Yandex.Disk by relative link, the data type is set to string for the input data fields.
"exception_msg": "unexpected end of file while reading quoted column beginning on line 2 and ending on line 4"
The string includes unpaired quotation mark.
Check that all quotation marks are escaped.
Description How to fix "parsing_error_of": "https://tlk.s3.yandex.net/wsdm2020/photos/2d5f63a3184919ce7e3e7068cf93da4b.jpg\t\t", "exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 1, sourceList size = 3)"
Extra tabs
If the TSV file contains more
\t
column separators after the data or the link than the number of columns set in the input data, you will get en error message.For example, if 1 column is defined in the input, and two more
\t\t
tabs are added in the TSV file after the link, you get 3 columns, 2 of which are extra.Remove extra column separators in the above example — both
\t\t
characters."exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 4, sourceList size = 6)"
The number of fields in the header and in the row doesn't match.
Make sure that:
- The number of tabs in the file structure is correct.
- String values with tab characters are enclosed in quotation marks
" "
.
"code": "VALUE_REQUIRED", "message": "Value must be present and not equal to null"
The value for a required input field is not specified. Make sure that columns with required input data fields are filled.
"code": "INVALID_URL_SYNTAX", "message": "Value must be in valid url format"
Invalid data in the URL field. Make sure that:- Links start with the
http://
,https://
orwww
prefix. - When you upload the file from Yandex.Disk by relative link, the data type is set to string for the input data fields.
"exception_msg": "unexpected end of file while reading quoted column beginning on line 2 and ending on line 4"
The string includes unpaired quotation mark.
Check that all quotation marks are escaped.
result
match the line number of the uploaded file. Lines that were processed with an error have the status "success": false
.- The project uses incremental relabeling. As an example, let's say there were 5 tasks on a page. For 4 of them, responses coincided and the common response was counted as correct. The fifth task was mixed into another set because it didn't get into the final response and it needs to be “reassessed”.
- Different tasks have different overlap. Tasks with higher overlap will be additionally shown in sets with the other remaining tasks in the pool.
- If a quality control rule changes a task's overlap, it will appear in a different set.
The same task may appear on different pages if: