Managing AI live streams in which multiple languages are spoken
This section is part of the AI Live Streaming Manual. It explains how to deliver live streams with AI captions and speech, when multiple languages are spoken in a live stream.
When multiple languages are spoken in a live stream, the speech-to-text conversion language changes. Clevercast has two solutions for this, which can be used alternately:
- Let Clevercast automatically detect the speech-to-text language and change it
- Manually change the speech-to-text language
Manual selection is slightly faster and more reliable. Automatic detection waits until a number of words have been spoken in a different language before switching (4-8 seconds). As a result, a limited number of closed captions and/or speech translations may be dropped after the language change. In some exceptional cases, automatic detection may also be difficult (e.g. a native French speaker speaking broken English with a heavy accent).
You can switch from manual to automatic and vice versa during the live stream (e.g. set it to manual during the main keynotes and use automatic afterwards).
Important: changing the speech-to-text language does not affect the viewers of your live stream. The same closed captions and speech translation languages remain available in the embedded video player at all times.
Creating an event for a livestream in which multiple languages are spoken
Follow the normal workflow to create an event with AI captions and speech.
Default language
In the first step of the wizard, your selection for 'Select (main) audio language spoken in the live stream' (which sets the event's 'Default Language') depends on whether AI speech is used:
- If only AI closed captions are used, choose the first language that will be spoken in the live stream (or the one most spoken).
- If AI speech translations are used, you should choose 'Original' (except for some rare cases). When 'Original' is selected in the player, viewers can listen to the floor audio (with the voices of the speakers). Below 'Original', the AI speech languages are selectable.
Allow changing the active speech-to-text language
Make sure to select 'Yes' (manually or automatically) in the dropdown for 'Allow changing the active speech-to-text language during the event'.
If necessary, you can change your selection afterwards in the 'Caption Languages' tab of the event. During the livestream it is also possible to change from manual to automatic (and vice versa), as long as you haven't selected 'No'.
Adding all languages that are spoken
In the second step of the wizard, add an AI caption and/or speech language for each language spoken in the live stream (see 'Requirements' below for a list of the supported languages).
If desired, after creating the event, you can continue to add vocabulary terms in different languages.
Testing, starting and stopping the live stream
Follow the normal workflow.
Changing the speech-to-text language
If you selected automatic detection of the speech-to-text language, you don't need to do anything else. Clevercast will automatically switch to the language spoken in the live stream (as long as it is one of the speech-to-text or text-to-text caption languages).
If you selected manual detection of the speech-to-text language, go to the real-time management room and watch the live stream in the video player. Whenever you hear another language being spoken, you should select that language.
At any time, the real-time management room lets you switch from manual to automatic detection and vice versa.
Real-time correction
Keep in mind that, when using real-time correction, the correctors must be proficient in the spoken languages. Every time the speech-to-text language changes, the language in the correction room will also change.
Requirements
Currently, this is supported for the following languages: Arabic, Bashkir, Basque, Belarusian, Bulgarian, Cantonese, Chinese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Italian, Indonesian, Japanese, Korean, Latvian, Lithuanian, Malay, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swedish, Tamil, Thai, Turkish, Uyghur, Ukrainian, Vietnamese and Welsh.
Other AI speech or caption languages may also be added, but can not be set as a speech-to-text language.
Using custom speech-to-text languages and/or behaviour
In addition to the solutions described above, we have another custom solution which is mainly used if:
- speech-to-text conversion is needed for a language not listed above
- the live stream consists of a conversation, in which different languages are spoken continuously
- you want to apply real-time correction, but your corrector don't master all the spoken languages
This custom solution uses a different language model, which automatically detects which language is spoken and translates it the speech-to-text language (on the fly).
Consequences
When using the custom language model, the speech to text conversion will be less accurate. In some cases the AI may also need a couple of sentences before correctly detecting which language is spoken. Given the lesser accuracy, we strongly recommend to use a corrector for this approach.
If a language other than the selected speech-to-text language is spoken, the captions for that language are the result of a double translation. For example: if English is set as speech-to-text language and German is spoken, the language model will convert the German speech to English text. If German captions are included, the English text is translated back to German. This may result in the German captions being a less literal representation of what was said. This can be avoided by changing the speech-to-text language (manually or automatically).
As long as you don't change the speech-to-text language (e.g. leave it as English when someone is speaking German), the language in the correction room will remain the same (it will remain English).
Combination with Translate@Home
Note: if an event uses both Translate@Home (= remote interpretation) and AI speech-to-text conversion, correctors can also use interpreter relay (= listen to the remote interpreters instead of the floor audio stream).