Skip to content

Getting started with AI audio translations and closed captions in live streams

Clevercast allows you to add (multilingual) closed captions to your live stream through transcription in real time, speech-to-text conversion with manual correction and automatic translation. It also lets you add AI audio translations (aka synthetic voices, live audio dubbing). For more info, read the introduction to closed captions in Clevercast.

This tutorial explains how to manage events with closed captions and AI audio translations in Clevercast. If you hire transcribers or speech-to-text correctors yourself, provide them with the transcription manual or speech-to-text correction manual.

If you manage transcribers and/or correctors, you may want them to test their internet connection in advance by doing a T@H connection test. The test lets you find out whether the transcriber’s internet connection is suited.

This tutorial applies to enterprise and webinar plans. Skip step 1 for webinars, since an event is automatically created when you create a webinar.

1 Creating an event

Select the Live > Events menu. On the events page, press the Create Multilingual Event button in the sidebar. In the popup dialog, enter the following info and press Create:

  • an event name (for your own information only)
  • select one of the broadcast protocols that support closed captions. Choose one of the Single-language broadcast for T@H and captions (if you are broadcasting a single language stream) or SRT multilingual broadcast if you are broadcasting an SRT stream with multiple audio tracks.
  • select the origin of your broadcast
  • select the Default Language. This is the language being broadcasted to Clevercast. When using speech-to-text, this will also be the first closed caption language. If the floor audio contains multiple languages, you can select 'Original' instead of real language.
  • only select Additional Languages if your live stream also contains multiple audio languages. The closed caption languages are added after the event is created.
  • select the Streaming Resolutions. Clevercast does cloud transcoding for adaptive bitrate streaming. Make sure the highest resolution doesn’t exceed the resolution in which you are broadcasting.

Note: if you are broadcasting from a continent other than Europe and North America, please use the SRT protocol or ask for a custom RTMP ingest hub.

2 Configuring the event

When you press the Create button, Clevercast creates the event and redirects you to its detail page. This detail page allows you to do the following:

  • Copy the event’s embed code from the Management tab to your website or 3rd party platform.
  • On the Appearance tab, you can upload a poster image and show a countdown timer in the embedded player (among others)
  • Depending on your plan, you may also be able to add simulcast languages on the Simulcasting tab

See the Enterprise manual for more details about the different tabs and their functionality.

Adding closed captions to the event

To add closed captions to the event, go to the Caption Languages tab.

The first caption language must either be the result of speech-to-text conversion or manual transcription. Currently, both can't be combined.

Next, you can add closed captions for additional languages. You can add captions that are the result of automatic translation. In case of manual transcription, you can also add additional captions that are the result of manual transcription.


  • When using speech-to-text without human correction, make sure that 'Allow captions to be corrected' is unchecked. Checking it will allow for correction, but also add a slightly higher latency to the live stream.
  • Consider adding a AI captioning vocabulary containing terms (names, acronyms, industry jargon, technical phrases ...) that may appear in the live stream, so they will be correct in the captions. You can also add custom translations, to be used instead of AI translation of a term. While an event is active, updates to the vocabulary will still take effect.
  • Your account also contains options to filter profanity and disfluency

If you are managing captioners or correctors, send them the links to the transcription or correction rooms.

Adding AI audio translations

On the Audio Languages tab, you can change the default language and add AI Interpreter languages (and/or T@H languages).

For each AI Interpreter language, a closed caption language is automatically created on the ‘Caption languages’ tab (if not already present). Clevercast uses the text of the closed captions to generate the synthetic voices. This means that the tools to improve the quality of the closed captions, will also improve the quality of the audio translations.

3 Test broadcast

Configure your encoder by using the settings on the Broadcast tab and our broadcast guidelines.

When you are ready to test, go to the Management tab and set the event status to Preview by pressing the Set to preview button. This is required for your broadcast to be processed, and for your transcribers and correctors to connect to their rooms.

Start your broadcast.

If you are managing captioners or correctors, you can use the Realtime Management page to communicate with your transcribers or correctors through text chat. Unfortunately, it is not yet possible to see the closed captions in the realtime player on this page.

You can use the Preview player to see the closed captions and hear the audio translations (with a delay, see below).

Livestreams with manual transcription

If your first caption language is using manual transcription, the live stream should become visible after about 18 seconds (due to HLS latency) in the Preview Player on the Management tab of your event.

Your transcribers should be able to connect to their interpreter rooms, see and hear the live stream and start to translate.

Livestreams with speech-to-text conversion

In this case, the live stream has a delay of about 2 minutes. As soon as you start broadcasting, the preview player will indicate the number of seconds before the live stream is available.

After the event is set to Preview, correctors are already able to connect to the Correction room. After about one and a half minute, they can see the live stream and the first closed captions, and are able to correct them.

You can use the preview player .

4 Starting the live stream

When the event is about to start, go the Management tab and press the Start event button. We recommend to start the event at least two minutes (4 minutes in case of speech-to-text) before the live stream begins. This allows the player to start buffering and ensures that nobody misses the start.

When you start the event, Clevercast also starts recording the live stream. So, starting the event in time also ensures that the cloud recording is complete.

The embedded player automatically detects the status change and makes the live stream visible to your participants. You can also see the live stream in the Public Player on the Management tab. If you want to display a message in the embedded player, use the Service Message on the Management tab.

Note: we strongly recommend to always set the event status to Preview before starting the event, otherwise the live stream may not be fully recorded and the recorded closed captions may not be in sync.

5 Stopping the live stream

After the live action has ended, wait at least 2 minutes (or 4 minutes for speech-to-text) before setting the event status to Ended, by pressing the Change Event Status button and selecting End event. The embedded player automatically detects this and shows a poster image or message to your viewers (see the Appearance tab) instead of the live stream.

Clevercast automatically completes the cloud recording and converts it to an MP4 file. You can download it via the Events > Recording menu. If your plan includes VoD, you can publish the recording to Video on-Demand.