Turn audio into text (audio decoding)

  1. Create a project
  2. Add a task pool
  3. Upload tasks
  4. Set up quality control
  5. Start the pool and get the results
  6. Let performers check the responses

Run the project in the Sandbox first. This helps you avoid making mistakes and spending money on a task that isn't working right.

You can publish tasks for transcribing short audio recordings. We recommend that all the recordings in a pool are the same length. It is best to launch transcription tasks in the Yandex.Toloka web version so that performers can use the keyboard for typing.

You may need additional projects for your task, such as dataset pre-check or checking performers' responses. Learn more about this in the Designing the solution architecture section.

Let's say you need to transcribe poems recited by children. To do this, create a task that provides an audio recording in the built-in player. The performer has to type the text they hear on the recording.

Example of a prepared task

To run tasks and get responses:

Create a project

The project defines what the task will look like for a performer.

  1. Click the + Create project button and choose the Audio transcription template.

  2. Enter a clear name and a short description for the project. Performers will see this in the task list.

  3. Write short and clear guidelines (see the recommendations).

  4. Note. This tutorial shows how to create a task interface in Yandex.Toloka. You can also try creating a task interface in the Template Builder.
    Define which objects you are going to pass to the performers and receive from them in response. To do this, add input and output fields in the Specifications block.
    What are input and output data?

    Input data is types of objects that are passed to the performer for completing the task. For example, this could be a text, an image, or geographic coordinates.

    Output data is types of objects that you receive after the task is completed. For example, this could be one of several response options, typed text, or an uploaded file.

    Learn more about input and output data fields.

    The template includes the fields:

    • Input data field — The audio link to an audio file.

      Change the data type to string to upload audio files stored on Yandex.Disk.

    • Output data field — The output string for saving the text entered by the performer.
  5. Create the task interface in the HTML block. It describes how the task elements should be arranged in the task.

    You can use standard HTML tags and special expressions in double curly brackets for input and output data fields.

      <audio src="{{proxy audio}}" controls controlsList="nodownload">
        Невозможно воспроизвести
      <div>Текст стихотворения</div>
      {{field type="textarea" name="output" width="300px" rows="6"}}
    This notation describes the following task design:
    • The audio recording in the player.
    • Text input field.

    Leave the JavaScript unchanged. It is configured to check the record playback in the player. The performer won&apos;t be able to send the response without listening to all audio recordings in the task.

  6. Click the Preview button to see the performer's view of the task.
    Note. The project preview shows one task with standard data. You can define the number of tasks to show on the page later.
  7. Save the changes. To switch to the Projects page, click Finish editing.

Add a task pool

A pool is a set of paid tasks sent out for completion at the same time.

  1. Open the project and click Add pool.
  2. Give the pool any convenient name and description. The pool info is only available to you. Performers can view only the project name and description.
  3. Set the price per task page (for instance, $0.05). The price depends on the length of the audio recordings.
    What is a task page?

    A page can contain one or several tasks. If the tasks are simple, you can add 10-20 tasks per page. Don't make pages too long because it slows down loading speed for performers.

    Performers get paid for completing the whole page.

    The number of tasks on the page is set when uploading tasks.

    What is the fair price for a task page?

    The general rule of pricing is the more time the performer spends to complete the task, the higher the price is.

    You can register in Yandex.Toloka as a performer and find out how much other requesters pay for tasks, or see examples of cost for different types of tasks.

  4. Add Filters to choose performers.
  5. Turn on the Non-automatic acceptance option and enter the number of days for checking the task in the Deadline field (for example, 7).
    What is non-automatic acceptance?

    The non-automatic acceptance option allows you to review completed tasks before accepting them and paying for them. If the performer didn't follow instructions, you can reject the task. The maximum allowed period for the review is set in the Deadline field.

  6. Set the Overlap, which is the number of performers to complete the same task. For the speech transcription, it is 1, as a rule.
  7. Set the Time allowed for completing a task page. This time should be enough to read the instructions, load the task, listen to audio recordings, and type text. (for example, 1200 seconds).
  8. Save the pool.

Upload tasks

Prepare your own task file. Check out the example in a demo TSV file. You can find it on the pool page. At the top-left of the page, there are links to TSV files with regular, control, and training tasks.

  1. Click Upload. In the window that opens, you can also download a sample TSV file by clicking Sample file for uploading tasks.
    What is TSV?
    A TSV file presents a table as a text file in which columns are separated by tabs.

    You can work with it both in a table editor and a text editor, and then save it to the desired format. More about working with a TSV file. There is a CSV format that is similar to TSV, but you should use a TSV file for uploading.

  2. Add input data, like links to files on Yandex.Disk in the format <unique name>/image1.mp3, where "unique name" is the name of your proxy (learn more). The header of the input data column contains the word INPUT.

    A link should look like this: <unique name>/audio1.mp3. The unique name is the name of your proxy. Learn more about using files from Yandex.Disk.

  3. Upload the tasks: choose Set manually and set the number of tasks (for example, 4 tasks per page). This means that there will be 4 audio recordings per page, each recording with a text field for transcription.
  4. Click Add to upload your tasks to the pool.

Set up quality control

Quality control rules allow you to filter out inattentive performers. You can configure quality control both in the project and in the pool.


Quality control settings are applied to all project pools, so you can't change them in just one of the pools.

    Go to pool editing (the Edit button in the upper-right corner of the page) and click Add Quality Control Rule.

    You can copy quality control settings from another pool. To do this, click Copy settings from in the Users filter section.

  1. Add a restriction for Fast responses.

    The Minimum time per page value depends on two characteristics: the number of tasks on this page and the length of audio recordings. In the example, we set four tasks and the audio length is unknown. We estimate an adequate threshold for the rule.

    Make allowances for technical errors. For example, some recordings failed to load or play. The performer will quickly submit responses for tasks like this and this won't be an error. Let's add two rules.

    • One is to catch bots. Set 10-15 seconds per response. Ban performers after two fast responses.

      This means that a user who completes two or more task pages in less than 10 seconds will be blocked for 10 days and won't be able to complete your tasks.

    • With the second rule, we'll exclude those who don't take the task seriously, don't listen to a recording all the way to the end, and don't put any thought into their responses. In this case, the Minimum time per task page value depends on the length of recordings and their amount on the page, as well as on how difficult it is to type text (it's hard to hear, there is jargon, problems with transcribing, and so on). Ban performers after three fast responses.

      This means that a user who gives a minimum of 3 responses in less than 30 seconds will be blocked for 5 days and won't be able to complete your tasks.

  2. Add the Review results quality control rule and enter the following values:

    This means that if 35% or more of a performer's responses are rejected, the performer is banned and can't access your tasks for 15 days. The rule takes effect after 3 responses of the performer are reviewed.

  3. Add the Review results quality control rule and enter the following values:

    This means that if 35% or more of a performer's responses are rejected, the performer is banned and can't access your tasks for 15 days. The rule takes effect after 3 responses of the performer are reviewed.

  4. Add Processing rejected and accepted assignments. When the overlap value is "1", you should resend assignments to the pool for other performers to redo them.

    This means that if you reject assignments during the review, they'll be sent for re-completion, but to another performer.

  5. Create a skill. To do this, go to the Skills page, click the +Add skill button and enter the skill name, for example, "Transcriber".
    What is a skill?
    A skill is an assessment of some aspect of the performer&apos;s work (a number from 0 to 100). A skill can be awarded to the performer for correct responses in control tasks. It can be appointed arbitrarily as well.

    You can use the skill value when choosing performers.

  6. Add the Submitted answers section and enter the following values:

    This means that the skill is appointed to the performer if they completed at least one task and the result was accepted.

Start the pool and get the results

  1. Start the pool by clicking .
  2. Track the completion of tasks in the Pool statistics section.
  3. When the first results are received, you can start the review . After the specified time period, all responses are automatically accepted, regardless of their quality.

    To review assignments, go to the pool and click Review assignments.

Let performers check the responses

Send the results to performers for the review as tasks. To make these tasks available to performers who didn&apos;t transcribe audio recordings, set the filter.

  1. Go to the pool and click Download results.
  2. Create a project with the classification type.
    Example of a prepared task
  3. Create a task interface that shows:
    • An audio recording in the audio player.
    • A transcript.
    • Response options:
      • The text fully matches the audio recording.
      • Minor mistakes were made in the text.
      • The recording is not fully transcribed.
      • The text doesn't match the audio recording.
  4. Add a pool and set Overlap to 3 in it.
  5. Add a filter to choose performers without skill:
  6. Upload tasks to the pool and start it.
  7. When the pool is fully completed, start aggregation of results.
  8. Accept transcription tasks without errors. Reject the rest, specifying the reason.
  9. Rejected tasks can be submitted for completion again.