Skip to content

AI captioning for live streams in multiple languages

When AI captioning is used for a live stream, you need to select the speech-to-text conversion language in advance. This is the language (predominantly) spoken in the live stream.

However, some live streams may have speakers in multiple languages. Clevercast currently has 2 solutions for this:

  • Let Clevercast change the active speech-to-text language during the event. This currently requires a manual intervention, but will be automated soon. We strongly recommended using this option (see below).
  • Use a different language model that auto-detects alternating languages and translates them on the fly. Unless different languages are often spoken interchangeably in the live stream, we recommend not using this option. Not only is this language model less accurate, but for text-to-text languages and audio translations it results in a double translation (see below).

Note: changing the speech-to-text language does not affect the viewers of your live stream. The same closed captions languages remain available in the embedded video player at all times.

Let Clevercast change the active speech-to-text language

How does it work?

For an event with a speech-to-text conversion language and multiple text-to-text translation languages, Clevercast allows you to set each of these languages as the active speech-to-text language during the event. When you update it, the language previously set as speech-to-text will automatically become one of the text-to-text translation languages.

For example, suppose you have an event with English as the (default) speech-to-text language and Spanish and French as translations. The moment a Spanish speaker comes up in the live stream, all you have to do is select 'Spanish' as the speech-to-text language. That way, Clevercast knows it needs to convert the floor audio into Spanish and translate the result into English and French.

Note: while this currently still needs manual intervention, we will soon automate this.

Requirements

Currently, this is supported for the following languages: Arabic, Bashkir, Basque, Belarusian, Bulgarian, Cantonese, Chinese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Italian, Indonesian, Japanese, Korean, Latvian, Lithuanian, Malay, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swedish, Tamil, Thai, Turkish, Uyghur, Ukrainian, Vietnamese and Welsh.

Other text-to-text translation languages may be set, without using them as a speech-to-text language. Currently, it is not possible to combine this feature with AI interpreter languages.

To enable this for an event, go to the event's Captions tab and change the 'Allow changing the active speech-to-text language during the event' setting to 'Yes'.

User interface and timing

Updating the speech-to-text language is done in the Realtime Management room. To determine when to switch, use the timing of the video player in the Realtime Management room. Do not use the preview player or the embedded player for your timing, as they stream with a delay.

Changing the speech-to-text language is only possible when the event is active (= set to Preview, Started or Paused). When an event becomes active, the default speech-to-text language will initially be used (even if another language was last selected during a previous session).

alt text

Link to the vocabulary interface

If you use speech-to-text corrector(s) for your event, you can also let them change the active speech-to-text language. To allow this in your account, go to 'Account' > 'Settings' in the Clevercast main menu: on the Settings page, open the 'Speech to Text' panel and set Allow correctors to change the speech-to-text language to 'Yes'.

However, the video player in the correction room has a delay of about 15 seconds (compared to the player in the Realtime Management room). If there is sufficient time in between speakers, this shouldn't matter. If not, preferably use the realtime management room instead.

Consequence for the correction room

When you change the speech-to-text language, this will also change the language of the closed captions in the correction room. So in the above example, the closed captions in the correction room will switch from English to Spanish.

Advantages

This method ensures the highest accuracy for closed captions, both in terms of speech-to-text conversion and translation into other languages.

Additionally, Clevercast lets you use custom vocabulary for each of the speech-to-text languages, including specific translations for your terms. A corrector can expand the vocabulary during the live stream (provided you use shortcuts), for each speech-to-text language that is set.

When to use it

We recommend using this method whenever possible (see requirements above). It only requires someone to be available to change the speech-to-text language manually.

Use a language model for alternating languages

How does it work?

When you create the speech-to-text language on the event page, check the 'Use language model for alternating languages (lower accuracy)' option. This instructs Clevercast to use a different language model that automatically detect which language is spoken and translates it the speech-to-text language. Note that this alternative language model is less accurate, and may result in less literal translations (see Disadvantages below).

Consequence for the correction room

In this case, the language in the correction room remains the same as the (initial) speech-to-text language. Thus, when a different language is spoken in the live stream, the corrector sees captions that are the result of a speech-to-text conversion and its translation.

Disadvantages

First of all, the accuracy of the speech-to-text conversion will be lower (depending on the language), as it requires a different AI engine. In some cases the AI may also need a couple of sentences before correctly detecting which language is spoken. Given the lesser accuracy, we strongly recommend to use a corrector for this approach.

Another disadvantage is that, if a language other than the default speech-to-text is spoken, the captions for that language are the result of a double translation. For example: if English is set as speech-to-text language and German is set as text-to-text translation language. When someone speaks English, the AI will convert this to English captions. But when someone starts to speak German, the AI will first convert and translate the speech to English captions, and then translate the English captions to German captions. As a result, the German captions will differ more from the German speech than when the German speech was directly converted to text (= the speech-to-text language was changed).

When to use it

We recommend to only use this method if it is not possible to change the speech-to-text language. For example, if people are speaking in different languages interchangeably, making it difficult or impossible to manually update the speech-to-text language. es, even if you manually select the correct speech-to-text language (in a future version, we'll try to avoid this).