Skip to content

Managing AI live streams in which multiple languages are spoken

This section is part of the AI Live Streaming Manual. It explains how to deliver live streams with AI captions and speech, when multiple languages are spoken in a live stream.

When multiple languages are spoken in a live stream, the speech-to-text conversion language changes. Clevercast has two solutions for this, which can be used alternately:

  • Let Clevercast automatically detect the speech-to-text language and change it
  • Manually change the speech-to-text language

Manual selection is slightly faster and more reliable. Automatic detection waits until a number of words have been spoken in a different language before switching (4-8 seconds). As a result, a limited number of closed captions and/or speech translations may be dropped after the language change. In some exceptional cases, automatic detection may also be difficult (e.g. a native French speaker speaking broken English with a heavy accent).

You can switch from manual to automatic and vice versa during the live stream (e.g. set it to manual during the main keynotes and use automatic afterwards).

Important: changing the speech-to-text language does not affect the viewers of your live stream. The same closed captions and speech translation languages remain available in the embedded video player at all times.

Creating an event for a livestream in which multiple languages are spoken

Follow the normal workflow to create an event with AI captions and speech.

Default language

In the first step of the wizard, your selection for 'Select (main) audio language spoken in the live stream'. This will set the the event's 'Default Language'. Choose the first language that will be spoken in the livestream; or the language most commonly spoken.

Note: if AI speech translations are used, Clevercast will set the event's 'Default Language' to 'Original'. By selecting this in the player, a viewer can hear the broadcasted audio stream. Below 'Original', the AI speech languages are selectable.

A live stream in which multiple languages are spoken, with AI speech translations in French and English

Allow changing the active speech-to-text language

Make sure to select 'Yes' (manually or automatically) in the dropdown for 'Allow changing the active speech-to-text language during the event'.

If necessary, you can change your selection afterwards in the 'Caption Languages' tab of the event. During the livestream it is also possible to change from manual to automatic (and vice versa), as long as you haven't selected 'No'.

Adding all languages that are spoken

In the second step of the wizard, add all languages that will be spoken in the live stream as closed captions and/or synthetic audio languages (see 'Requirements' below for a list of the supported languages). This is necessary for the speech-to-text conversion.

If desired, after creating the event, you can continue to add vocabulary terms in different languages.

Testing, starting and stopping the live stream

Follow the normal workflow.

Changing the speech-to-text language

If you selected automatic detection of the speech-to-text language, you don't need to do anything else. Clevercast will automatically switch to the language spoken in the live stream (as long as it is one of the speech-to-text or text-to-text caption languages).

If you selected manual detection of the speech-to-text language, go to the real-time management room and watch the live stream in the video player. Whenever you hear another language being spoken, you should select that language.

At any time, the real-time management room lets you switch from manual to automatic detection and vice versa.

Real-time correction

Keep in mind that, when using real-time correction, the correctors must be proficient in the spoken languages. Every time the speech-to-text language changes, the language in the correction room will also change.

Requirements

Currently, this is supported for the following languages: Arabic, Bashkir, Basque, Belarusian, Bulgarian, Cantonese, Chinese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Italian, Indonesian, Japanese, Korean, Latvian, Lithuanian, Malay, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swedish, Tamil, Thai, Turkish, Uyghur, Ukrainian, Vietnamese and Welsh.

Other AI speech or caption languages may also be added, but can not be set as a speech-to-text language.

Using custom speech-to-text languages and/or behaviour

In addition to the solutions described above, we have another custom solution which is mainly used if:

  • speech-to-text conversion is needed for a language not listed above
  • the live stream consists of a conversation, in which different languages ​​are spoken continuously
  • you want to apply real-time correction, but your corrector don't master all the spoken languages

This custom solution uses a different language model, which automatically detects which language is spoken and translates it the speech-to-text language (on the fly).

Consequences

When using the custom language model, the speech to text conversion will be less accurate. In some cases the AI may also need a couple of sentences before correctly detecting which language is spoken. Given the lesser accuracy, we strongly recommend to use a corrector for this approach.

If a language other than the selected speech-to-text language is spoken, the captions for that language are the result of a double translation. For example: if English is set as speech-to-text language and German is spoken, the language model will convert the German speech to English text. If German captions are included, the English text is translated back to German. This may result in the German captions being a less literal representation of what was said. This can be avoided by changing the speech-to-text language (manually or automatically).

As long as you don't change the speech-to-text language (e.g. leave it as English when someone is speaking German), the language in the correction room will remain the same (it will remain English).

Combination with Translate@Home

Note: if an event uses both Translate@Home (= remote interpretation) and AI speech-to-text conversion, correctors can also use interpreter relay (= listen to the remote interpreters instead of the floor audio stream).