Using real-time transcription and translation
This section is part of the AI Live Streaming Manual. It explains how event organizers can make the AI text transcription and speech translation(s) available to on-site event attendees without any delay. This way, people can read the speech-to-text conversion, for example on a big screen or on their smartphone. Or they can listen at a conference to translations in real-time, using their smartphone and headphones.
Real-time delivery
Since participants need to get the stream without delay, WebRTC is used instead of HTTP Live Streaming (HLS) to deliver the real-time transcription and translation to on-site participants. This is supported by all modern (mobile) browsers.
Text transcripts
For a textual representation and translation of what is being said, Clevercast offers an automatically scrolling text transcript. Real-time transcription of the spoken language is delivered with a minimum delay of 1-2 seconds. Translated transcriptions have a minimum delay of 2-4 seconds.
The look and feel of the scrolling transcript can be customized by the user, via the cog wheel in the top-right corner of the transcript page.
Audio translation
Before the event starts, an event manager provides participants with a link to the (audio) translation in their language. For participants at the event location, use the links to the audio-only webplayer. When the event starts, they just use their smartphones to click on the link, press the start button and listen to the translation (e.g. using headphones with a mini jack).
Notes:
- The event in Clevercast must be set to
Started
so people can connect to the player. - The player allows participants to hear the AI translations in their own language. Note that when a speaker at the event is speaking their own language, the AI translation will keep playing. To listen to the live speaker, they have to take off their headphones.
- If used at the event venue, make sure that the local WiFi network can handle the number of connections and bandwidth. The audio-only player requires about 80 kbps per listener.
- A video player is also available for virtual event participants. Given the extra bandwidth that video requires, you shouldn't provide the link to the video player to on-site attendees.
Event Configuration
Follow the normal workflow to create an event with AI closed captions and/or speech translations. In the AI wizard, make sure to choose 'SRT' as broadcast protocol instead of RTMP, which allows you to broadcast with a lower latency. In step 2 of the wizard, check 'Provide real-time transcription for event attendees' and/or 'Provide real-time audio translations for event attendees'.
We recommend creating an AI vocabulary, as this will (also) improve the quality of the real-time transcription and translation.
The further configuration of your event depends on whether it is only used for real-time purposes, or also for online video streaming.
1. Real-time only event
If the sole purpose of the event is to provide real-time transcription and/or translation, the video stream plays no part. You will want to minimise the delay of the transcription and translation as much as possible.
Management
After the AI wizard finishes, you are directed to the 'Management' tab of the event. On this tab, scroll down and adjust following settings:
- Resolutions: remove all resolutions except for 240p. Since you are not streaming video, transcoding to multiple video resolutions is pointless.
- Latency: if you only need transcriptions, select 'Low Latency' to avoid additional delay when generating the transcript text(s). If you also need AI translations, 'Default Latency' is required.
Broadcast
Currently, Clevercast still expects an incoming video and audio stream. But since the video isn't watched by anyone, you can simply broadcast a black screen.
Go to the 'Broadcast' tab of the event to configure your encoder. The SRT protocol should be selected (after you chose it in the AI wizard). Please note that using RTMP will introduce extra second(s) of delay.
To minimise delay, set your encoder to the minimum latency of 200 milliseconds for a broadcast from Europe (if you use OBS Studio, you can copy the SRT caller URL on the 'Broadcast' tab of the event page).
Note: we hope to add support for sending audio-only through the browser soon, eliminating the need for an encoder if you only need real-time transcription.
2. Combination with live streaming
Management
On the 'Management' tab of the event, you can choose between 'Default Latency' and 'Low Latency'. If you are also live streaming, we recommend selecting Default Latency, as this will improve the quality of the closed captions.
Notes:
- Combining live streaming with real-time transcription and translation may, to a limited extent, negatively affect the quality of the closed captions in the live stream. So if you don't have on-site users, we recommend turning these settings off.
- Since the transcripts are generated in real-time, the accuracy of translated transcrips will be lower than the accuracy of translated closed captions in the live stream, as the latter's delay allows more context to be provided to the AI models.
- Real-time transcripts are also supported for events with human correctors. However, corrections only apply to the closed captions and speech translations in the live stream, not to the real-time transcription and translation.
Broadcast
If you combine real-time transcription with live streaming, the same broadcast settings will apply to both. We recommend testing beforehand to determine what SRT latency to set on your encoder (as low as possible for the transcript, but sufficiently high to guarantee a stable live stream). For a broadcast from Europe, this will probably be between 400 and 800 milliseconds. For a broadcast from the US, 1 second is more appropriate.
Distribution of the real-time links
Before the event starts, the event manager needs to distribute the real-time transcription and/or translation links to to your on-site production team and/or event participants.
Real-time transcription links
On the 'Caption Languages' tab of the event, the real-time transcript(s) for each caption language are available. If you don't see the 'Real-time Transcription' panel, check the 'Provide real-time transcription for event attendees' setting on top of the page (it should be set to 'Yes').
There are 2 kinds of transcription links:
- On-site participant links: intended for distribution to event participants, so they can read the transcription on mobile devices. The scrolling text is only displayed when the event status is
Started
. - On-site production links: intended for the event organizer to display the scrolling text on-site, for example on large screens. The scrolling text is also displayed when the event status is
Preview
orPaused
.
Real-time audio links
On the 'Audio Languages' tab of the event, the real-time links to each translation language are available in the the 'Realtime Participant Links' panel. If you don't see it, check the 'Provide real-time AI translations for event attendees' setting on top of the page (it should be set to 'Yes').
For on-site event attendees, copy the audio-only links (for the language they speak) and pass them on to them.
Event management
Follow the normal workflow to test, start and stop your live stream, including the real-time transcription and translation.