Skip to content

AI captioning for live streams in multiple languages

When AI captioning is used for a live stream, you need to select the speech-to-text conversion language in advance. This is the language (predominantly) spoken in the live stream.

However, some live streams may have speakers in multiple languages. Clevercast currently has 2 solutions for this:

  • Manually select the active speech-to-text language (strongly recommended, see below)
  • Let Clevercast auto-detect and translate the languages in a mixed floor (lower accuracy)

Note: changing the speech-to-text language does not affect the viewers of your live stream. The same closed captions languages remain available in the embedded video player at all times.

Manually select the active speech-to-text language

How does it work?

For an event with a speech-to-text conversion language and multiple text-to-text translation languages, Clevercast allows you to set each of these languages as the active speech-to-text language during the event. When you update it, the language previously set as speech-to-text will automatically become one of the text-to-text translation languages.

For example, suppose you have an event with English as the (default) speech-to-text language and Spanish and Chinese as translations. The moment a Spanish speaker comes up in the live stream, all you have to do is select 'Spanish' as the speech-to-text language. That way, Clevercast knows it needs to convert the floor audio into Spanish and translate the result into English and Chinese.

User interface and timing

Updating the speech-to-text language is done in the Realtime Management room. To determine when to switch, use the timing of the video player in the Realtime Management room. Do not use the preview player or the embedded player for your timing, as they stream with a delay.

Changing the speech-to-text language is only possible when the event is active (= set to Preview, Started or Paused). When an event becomes active, the default speech-to-text language will initially be used (even if another language was last selected during a previous session).

alt text

Link to the vocabulary interface

If you use speech-to-text corrector(s) for your event, you can also let them change the active speech-to-text language. To allow this in your account, go to 'Account' > 'Settings' in the Clevercast main menu: on the Settings page, open the 'Speech to Text' panel and set Allow correctors to change the speech-to-text language to 'Yes'.

However, the video player in the correction room has a delay of about 15 seconds (compared to the player in the Realtime Management room). If there is sufficient time in between speakers, this shouldn't matter. If not, preferably use the realtime management room instead.

Consequence for the correction room

When you change the speech-to-text language, this will also change the language of the closed captions in the correction room. So in the above example, the closed captions in the correction room will switch from English to Spanish.


This method ensures the highest accuracy for closed captions, both in terms of speech-to-text conversion and translation into other languages.

Additionally, Clevercast lets you use custom vocabulary for each of the speech-to-text languages, including specific translations for your terms. A corrector can expand the vocabulary during the live stream (provided you use shortcuts), for each speech-to-text language that is set.

When to use it

We recommend using this method whenever possible. It only requires someone to be available to change the speech-to-text language manually.

Auto-detection and translation of mixed floor languages

How does it work?

When you create the speech-to-text language on the event page, check the 'Auto-detect mixed floor (lower accuracy)' option. This instructs the AI captioning to automatically detect which language is spoken and translate it the speech-to-text language, but it has a negative impact on accuracy.

Consequence for the correction room

In this case, the language in the correction room remains the same as the (initial) speech-to-text language. Thus, when a different language is spoken in the live stream, the corrector sees captions that are the result of a speech-to-text conversion and its translation.


First of all, the accuracy of the speech-to-text conversion will be lower (depending on the language), as it requires a different AI engine. In some cases the AI may also need a couple of sentences before correctly detecting which language is spoken. Given the lesser accuracy, we recommend to use a corrector for this approach.

Another disadvantage is that, if a language other than the default speech-to-text is spoken, the captions for that language are the result of a double translation. For example: if English is set as speech-to-text language and German is set as text-to-text translation language. When someone speaks English, the AI will convert this to English captions. But when someone starts to speak German, the AI will first convert and translate the speech to English captions, and then translate the English cpations to German captions. As a result, the German captions will differ more from the German speech than when the German speech-to-text language was selected manually (= direct conversion of the German speech into German captions).

When to use it

We recommend to only use this method if it is not possible to manually change the speech-to-text language. For example:

  • if people are speaking in different languages interchangeably, making it difficult or impossible to manually update the speech-to-text language.
  • you don't know in advance which languages will be spoken in the live stream.

Combination with manual selection

For an event with auto-detection and translation, it is still possible to manually select the speech-to-text language (on some or all occasions the floor language changes).

This way, you combine both methods:

  • if you manually select the right speech-to-text language, the AI will immediately start using it
  • if the spoken language changes without a manual update of the speech-to-text language, the AI will automatically detect and translate it

But keep in mind that the lower accuracy that comes with this approach currently still applies, even if you manually select the correct speech-to-text language (in a future version, we'll try to avoid this).