Creating a TSV file with tasks

Tasks are uploaded to the pool in a TSV file.

Download a sample file for your project by clicking File example for task uploading (tsv) on the pool page. To do this, click Upload. In the window that opens, download a sample TSV file here: Sample file for uploading.

If you need to upload different task types to a pool, upload multiple TSV files, each file containing a different task type.

Structure of the TSV file

The first line of the file contains the column headers:

Task type depends on which fields are filled in:

To create a main task, fill in the columns with the INPUT header.

Example with a simple object (string, link, and so on)
Example with a string array

The columns with required input data fields must be filled. The other columns can be deleted if they are empty.

Working with the TSV file

Popular spreadsheet editors allow you to import and export data in TSV format:

You can work with data in a spreadsheet and then save it to the desired format.
  1. Create a spreadsheet with the appropriate headings or copy them from the TSV file example.
  2. Add data for tasks.
  3. Copy the entire table and paste it into a simple text editor (such as Notepad in Windows or TextEdit in Mac).
  4. Save the file in UTF-8 encoding with the tsv extension.

The maximum file size is 100 MB.

Escaping strings

To include multiple paragraphs or tab characters in the string type field:
  • Add another quotation mark to each quotation mark of the " type in the text data.

    Don't escape other quotation marks (« » and “ ”).

  • Enclose the field in quotation marks " ".

Unescaped quotation marks are removed when processing the TSV file.

Escaping examples
Input data Result
"Task in ""Toloka""

Task in "Toloka"

"Task in «Toloka»"

Task in «Toloka»

Escaping data in JSON format

To load data in a field with the json type:

  1. Add another quotation mark to each " type of quotation mark in the JSON object.

  2. Enclose the field in quotation marks " ".

Escaping examples
Input data Result
"[{""type"":""polygon"",""data"":[{""x"":0.24173,""y"":0.25118},{""x"":0.31327,""y"":0.24896},{""x"":0.31327,""y"":0.32453},{""x"":0.27576,""y"":0.34898},{""x"":0.23061,""y"":0.32564}]}]"
[{"type":"polygon","data":[{"x":0.24173,"y":0.25118},{"x":0.31327,"y":0.24896},{"x":0.31327,"y":0.32453},{"x":0.27576,"y":0.34898},{"x":0.23061,"y":0.32564}]}]

Escaping the JSON array data

To load data in a field with the json type:

  1. Add another quotation mark to each " type of quotation mark in the JSON object.

  2. Inside the object, add \ before the comma.

  3. Enclose the field in quotation marks " ".

Escaping examples
Input data Result
"{""url"":""https://fotki-kotikov.ru/1""\,""img"":""https://fotki-kotikov.ru/cat1.jpg""\,""description"":""Just fat cats\, because nothing can be more beautiful.""\,""title"":""Fat cats""},{""url"":""https://fotki-kotikov.ru/2""\,""img"":""https://fotki-kotikov.ru/cat2.jpg""\,""description"":""Just red cats\, as everyone knows\, that red cats bring money.""\,""title"":""Red cats""},{""url"":""https://fotki-kotikov.ru/3""\,""img"":""https://fotki-kotikov.ru/cat3.jpg""\,""description"":""Yes\, it will be just sleeping cats""\,""title"":""Sleeping cats""}"
{"url":"https://fotki-kotikov.ru/1", "img":"https://fotki-kotikov.ru/cat1.jpg", "description": "Just fat cats, because nothing can be more beautiful.", "title": "Fat cats"},{"url":"https://fotki-kotikov.ru/2","img":"https://fotki-kotikov.ru/cat2.jpg","description":"Just red cats, as everyone knows that red cats bring money.", "title": "Red cats"},{"url":"https://fotki-kotikov.ru/3","img":"https://fotki-kotikov.ru/cat3.jpg","description":"Yes, it will be just sleeping cats", "title": "Sleeping cats"}

Troubleshooting

Uploading tasks to a pool
How many tasks should be in a suite?

The number of tasks depends on how difficult and time-consuming the tasks are. Keep the size reasonably small. Large task suites are unpopular, partly because they are inconvenient for performers (for example, if the internet connection is unstable).

Errors when uploading tasks in the pool
How do I view the processing log?
To view the processing log, click More on uploading errors. The processing log is written in JSON format. Objects inside result match the line number of the uploaded file. Lines that were processed with an error have the status "success": false.
Tip. To work with a large log conveniently, copy it to the text editor.
Errors in column headers

If the column headings are incorrect, the whole file is rejected. Otherwise, Toloka specifies the number of tasks with processing errors.

Processing errors table
Overview How to fix
"parsing_error_of": "https://tlk.s3.yandex.net/wsdm2020/photos/2d5f63a3184919ce7e3e7068cf93da4b.jpg\t\t",
"exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 1, sourceList size = 3)"

Extra tabs.

If the TSV file contains more \t column separators after the data or the link than the number of columns set in the input data, you will get en error message.

For example, if 1 column is defined in the input, and two more \t\t tabs are added in the TSV file after the link, you get 3 columns, 2 of which are extra.

Remove extra column separators in the above example — both \t\t characters.

"exception_msg": "the nameMapping array and the sourceList should be the same size (nameMapping length = 4, sourceList size = 6)"

The number of fields in the header and in the row doesn't match.

Make sure that:

  • The number of tabs in the file structure is correct.
  • String values with tab characters are enclosed in quotation marks " ".
"code": "VALUE_REQUIRED", "message": "Value must be present and not equal to null"
The value is missing for a required input field.

Make sure that columns with required input data fields are filled.

"code": "INVALID_URL_SYNTAX", "message": "Value must be in valid url format"
Invalid data in the “”“URL” field.
Make sure that:
"exception_msg": "unexpected end of file while reading quoted column beginning on line 2 and ending on line 4"

Unpaired quotation mark in a string.

Check that all quotation marks are escaped.

How do I know how many tasks a performer will see on the page?

You can specify the number of tasks on the page when you upload your tasks to the pool. For more information about distributing tasks across pages, see this article.

How do I upload the file with the accepted assignments back to Toloka for projects with non-automatic acceptance? Where do I find the format of the upload data?

Use the button Upload review results to upload your file. You can see the format here.

Assignments are reviewed in a TSV file.

Why haven't I received assignments since I launched my first project, and all the uploaded assignments are marked as "Training"?

Check the hint field. For the main tasks, this field must be empty.

How do I create the task file properly so that there are no errors?

In the file with the main tasks, the columns with the INPUT headers must be filled out. You can see those headers if you download a sample file from the pool.

If you are creating control tasks, fill out the GOLDEN columns with the correct responses.

If you are creating a training task, you also need to fill in the HINT:text column. For the main tasks you don't need any columns other than INPUT, so feel free to delete them.

The file format must be TSV, and the encoding must be UTF-8.

For more information about creating the file, see the Guide. If there are errors during the upload, look up the error description on this page.

Why do I see a syntax error when I upload a task where a user has to view an image and write feedback?

The error might occur if the expected input type is URL, but a string is received.

There may be two reasons:
  • The input field has the "link" type.
  • The pool was created for an outdated project version. It means that the pool was created before you changed the input field type.
What is the maximum number of tasks per page?

It depends on the task. Technically, you can use as many tasks you want.

But users are reluctant to take lengthy tasks. They'd rather do 10 tasks that take one minute each than one task that takes 10 minutes.

In addition, if you use a large number of tasks on the page, there might be issues with uploading the files to be labeled. This problem might occur with images.

The third thing to consider is quality control and assignment review. If you use recompletion of assignments from banned users, you should split the task into smaller parts so that fewer assignments are recompleted. You are more likely to meet your budget this way.

How do I specify smart mixing settings in the interface when uploading a file?

Smart mixing settings are specified for the file rather than for the pool.

The settings specified during the first file upload are applied to all the files that are uploaded to this pool later on.

What is the right time limit for the task completion?
Try completing the tasks yourself. Ask your colleagues and friends to complete them. Find out average completion time and add 50% to it.
What is the difference between "task" and "task_suite"?

A task means a separate task. A task suite means a page with tasks. The performer gets paid for a task suite.

The same task appeared on different pages

The same task may appear on different pages if:

  • The project uses incremental relabeling. As an example, let's say there were 5 tasks on a page. For 4 of them, responses coincided and the common response was counted as correct. The fifth task was mixed into another set because it didn't get into the final response and it needs to be “reassessed”.
  • Different tasks have different overlap. Tasks with higher overlap will be additionally shown in sets with the other remaining tasks in the pool.
  • If a quality control rule changes a task's overlap, it will appear in a different set.
TSV file
When I generate a TSV file with links to images on Yandex.Disk, the images are not displayed. Why?

You can read about connecting Yandex.Disk here.

The project template must contain something like this:

<img src={{proxy img}} width="400">, where img is an input field in the string format.

Use the example.jpg file for testing. You can find its URL under Profile → External Services Integration.

Why does the preview display all the photos from the TSV file at once?

You must use a separate row for each task in your TSV file. For more information, see here.

When you create a pool, the pool will have settings for the number of tasks per page.

How do I add multiple "known_solutions" to a TSV file with a training task?

You can't use the interface to upload the tasks with multiple correct responses to the pool. You can only use the API for that.

Where is my TSV file added if I upload it to the running pool?

If you have the Keep task order option enabled, labeling will start after the previously uploaded tasks are taken by users. If this option is disabled, we can't guarantee that the tasks are assigned in their sequence order.

How do I properly structure my TSV file used for data upload if there is JSON data among the input?

All the values are written to the same column. Make sure to escape quotes. For more information about escaping quotes in JSON format, see the Guide.

How do I write an array to an input TSV file?

The array of strings in the input data must be comma-separated. For example: INPUT:typestext1, text2, text3, text4

Are TSV files sensitive to the order of the INPUT field and GOLDEN fields?

TSV files are insensitive to the order of fields. Use your preferred order of fields.

If there are no headers for some input columns in the TSV file, are they going to be skipped during import? Will they be skipped if they have headers without the "INPUT:.." prefix?

No. If you try to upload a file with missing headers to the pool, the system issues an upload error. All the INPUT fields required in the specification must be present in the TSV file with tasks. There must be no extra fields or columns.

If you don't want to show some data to performers, but you still need this data in the file, create the optional hidden input fields for such data in the project.

How do I properly structure my TSV file used for data upload if there is JSON data among the input?

All the values are written to the same column. Make sure to escape quotes.

For more information about escaping quotes in JSON format, see the Guide.

I have a task for photo classification. When there are more than 5 photos on the page, why does Toloka split them across 2 pages?

Toloka will split the links to images in the uploaded file into pages depending on the method you specified when uploading the TSV file. For more information about the three upload methods, see the Guide.

Input data
The system interprets commas inside my array elements as separators between the array elements. How do I avoid this?

Escape commas with a backslash (\).

How is the data from the "hint" column displayed?

The hint column should be filled out for your training tasks. When creating a main task, you only need to fill out the input fields. Omit the other fields or delete them along with their headers.

The file structure and how to fill it out is described here.

What do the lines "Add your text here" mean?

"Add your text here" is a hint for you. It means that you can replace the text in the field with your task data. The file structure and how to fill it out is described here.

Why do double quotes disappear from the output if I try to escape them using quotation marks?

If you have one word enclosed in quotes, format the uploaded assignment like this: "How many letters are there in the word ""Liechtenstein""". If you are escaping quotes inside your text, then the entire text must be enclosed in quotes. For more information, see the Guide.

How do I insert a link in the GOLDEN field?

Text in the GOLDEN field must match the control text exactly.

Usually, if you copy site links from the browser, the copied links have the same format. But this is not the case when the link is trimmed or typed manually.

Check the links that you use. There are several ways to unify links:
  • Add requirements for the link format in your instructions and hints in your training pool.
  • Use RegExp in your JS to trim the received links and write the result to the new output field, and then match the received value against the control value.