523-RTT translation (#1280)

* 523-updates * Updated title * pricing and overview updates * typo-fix
AgoraIO · Nov 9, 2024 · 79abb64 · 79abb64
1 parent 8304b13
commit 79abb64
Show file tree

Hide file tree

Showing 11 changed files with 233 additions and 72 deletions.
diff --git a/real-time-stt/develop/encrypt-captions.mdx b/real-time-stt/develop/encrypt-captions.mdx
@@ -1,6 +1,6 @@
 ---
 title: 'Encrypt captions'
-sidebar_position: 1
+sidebar_position: 2
 type: docs
 description: >
   Encrypt the captions transcribed with RTT

diff --git a/real-time-stt/develop/parse-data.mdx b/real-time-stt/develop/parse-data.mdx
@@ -1,6 +1,6 @@
 ---
 title: 'Parse transcription data'
-sidebar_position: 0.5
+sidebar_position: 1
 type: docs
 description: >
   Encrypt the captions transcribed with RTT

diff --git a/real-time-stt/develop/record-captions.mdx b/real-time-stt/develop/record-captions.mdx
@@ -1,6 +1,6 @@
 ---
 title: 'Record captions'
-sidebar_position: 2
+sidebar_position: 3
 type: docs
 description: >
   Record the captions transcribed with RTT in real time

diff --git a/real-time-stt/develop/supported-languages.mdx b/real-time-stt/develop/supported-languages.mdx
@@ -1,6 +1,6 @@
 ---
 title: 'Supported languages'
-sidebar_position: 5
+sidebar_position: 6
 type: docs
 description: >
   The list of languages supported for real-time speech-to-text

diff --git a/real-time-stt/develop/transcribe-individual-host.mdx b/real-time-stt/develop/transcribe-individual-host.mdx
@@ -1,6 +1,6 @@
 ---
 title: 'Transcribe specified hosts'
-sidebar_position: 3
+sidebar_position: 4
 type: docs
 description: >
   Transcribe the speech of specific channel hosts only

diff --git a/real-time-stt/develop/translation.mdx b/real-time-stt/develop/translation.mdx
@@ -0,0 +1,13 @@
+---
+title: 'Real-time translation'
+sidebar_position: 1.5
+type: docs
+description: >
+  Translate transcription text to multiple languages.
+---
+
+import EnableService from '@docs/shared/real-time-stt/develop/translation.mdx';
+
+export const toc = [{}];
+
+<EnableService />
diff --git a/real-time-stt/develop/update-service.mdx b/real-time-stt/develop/update-service.mdx
@@ -1,6 +1,6 @@
 ---
 title: 'Update service'
-sidebar_position: 4
+sidebar_position: 5
 type: docs
 description: >
   Update the Real-Time STT service

diff --git a/real-time-stt/overview/pricing.mdx b/real-time-stt/overview/pricing.mdx
@@ -9,26 +9,24 @@ description: >
 
 export const toc = [{}];
 
-This page introduces the billing policy for the <Vpd k="NAME"/> add-on provided by Agora.
-
-Your billing details may differ if you have signed a contract with Agora.
+Agora calculates the billing for all projects under your Agora account on a monthly basis. Billing begins once you
+enable <Vpd k="NAME"/>.
 
-## Overview
+This page explains <Vg k="COMPANY" />'s billing policy for the <Vpd k="NAME"/> add-on.
 
-Agora calculates the billing of all projects under your Agora account on a monthly basis. Billing begins once you
-enable <Vpd k="NAME"/>.
+<Admonition type="info">
+Your billing details may differ if you have signed a contract with Agora.
+</Admonition>
 
 ## Transcription fee
 
-When <Vpd k="NAME"/> is enabled for a channel, it transcribes the audio of its active hosts. When <Vpd k="NAME"/> is enabled for specific hosts, it only transcribes the audio of the specified hosts and ignores the others. The <Vpd k="NAME"/> service employs algorithms that remove the periods of silence and improve WER (Word Error Rate) of transcription. The processed audio is transcribed by the <Vpd k="NAME"/> engine and referred to as transcription duration. Agora charges for the transcription duration of all or specified hosts in the channel.
-
-The unit price is as follows:
+When <Vpd k="NAME"/> is enabled for a channel, it transcribes the audio of the active hosts. When <Vpd k="NAME"/> is enabled for specific hosts, it only transcribes the audio of the specified hosts and ignores others. The <Vpd k="NAME"/> service employs algorithms that remove periods of silence and improve Word Error Rate (WER) of transcription. The processed audio is transcribed by the <Vpd k="NAME"/> engine and its duration is referred to as the transcription duration. Agora charges for the transcription duration of all or specified hosts in the channel. The unit price is as follows:
 
 |Billing item |Usage, minutes per month |Pricing, US$/1,000 minutes|
 |-------------|--------------------|--------------------------|
 |Transcription duration | Above 0         | 16.99             |
 
-**Example**
+#### Example
 
 After you enable <Vpd k="NAME"/>: 
 - Host A speaks for 2 minutes and remains silent for 8 minutes.
@@ -46,23 +44,49 @@ In this case, the total transcription minutes are calculated as 2 (Host A) + 3 (
 
 ## Language identification fee
 
-<Vpd k="NAME"/> supports dynamic language detection when two or more languages are enabled for a channel or specific hosts. The LID (language identification) duration is the same as the transcription duration.
+<Vpd k="NAME"/> supports dynamic language detection when two or more languages are enabled for a channel or specific hosts. The Language Identification (LID) duration is the same as the transcription duration.
 
 |Billing item|Usage, minutes per month |Pricing, US$/1,000 minutes|
 |--------------------|--------------------|--------------------------|
-|Language identification duration|Above 0         | 5.00                     |
+|Language identification duration | Above 0        | 5.00            |
+
+#### Example
 
-Examples:
-- Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
-- #3: If Spanish and Chinese LID is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B and 7 minutes for host C. Therefore, the transcription duration is 2 + 3 + 3 = 8 minutes. the LID duration is 8 minutes, too, being the sum of 2 minutes for host A, 3 minutes for host B, and 3 minutes for host C.
+- Suppose a channel exists for 10 minutes. There are three active, unmuted hosts, A, B, and C.
+- If Spanish and Chinese LID is enabled for this channel at the start, the algorithm removes 8 minutes of silent audio for host A, 7 minutes for host B and 7 minutes for host C. Therefore, the transcription duration is 2 + 3 + 3 = 8 minutes. The LID duration is also 8 minutes, being the sum of 2 minutes for host A, 3 minutes for host B, and 3 minutes for host C.
 - If Spanish and Chinese LID is enabled for host A, then the transcription duration and LID duration are both 2 minutes.
 
 Notes:
-- The <Vpd k="NAME"/> transcription duration does not change if you enable more than 1 language.
-- If only 1 language is set for a channel or a specified host, the language detection will not start.
+- The <Vpd k="NAME"/> transcription duration does not change if you enable more than one language.
+- If only one language is set for a channel or a specified host, language detection does not start.
+
+## Translation fee
+
+When you enable real-time translation for a channel or a user, transcription is activated first. The transcription text is then translated to the target languages. The translation usage minutes are the same as the transcription usage minutes. The real-time transcription and translation usage and cost is shown in your monthly invoice. The unit price is as follows:
+
+|Billing item | Pricing, US$/1,000 minutes|
+|-------------|------------------|
+|Translation  | 8.99             |
+
+#### Example
+
+After you enable <Vpd k="NAME"/>: 
+- Host A speaks Russian for 2 minutes and remains silent for 8 minutes.
+- Host B speaks French for 3 minutes and remains silent for 7 minutes.
+- Host C speaks Russian for 3 minutes and remains silent for 7 minutes.
+- All hosts are silent for the first 2 minutes of the call.
+- Russian and French are translated to English.
+
+In this case, the total transcription minutes are calculated as 2 (Host A) + 3 (Host B) + 3 (Host C) = 8 minutes. The translation minutes are the same as transcription minutes. Agora charges 8 minutes transcription fee and 8 minutes translation fee. 
+
+Total cost = 8/1000*$16.99 + 8/1000*$8.99 = $0.136 + $0.072 = $0.208. 
+
+If you translate Russian and French to English and German, the translation cost is multiplied by 2:
+
+Total cost = 8/1000*$16.99 + 8/1000*$8.99*2 = $0.136 + $0.144 = $0.28.
 
 ## Free-of-charge duration
 
-<Vpd k="NAME"/> provides 300 minutes of free-of-charge duration for integration and testing purposes.
+<Vpd k="NAME"/> provides 300 free-of-charge minutes for integration and testing purposes.
 
 Contact [email protected] or your AE to get a discount.
diff --git a/real-time-stt/overview/product-overview.mdx b/real-time-stt/overview/product-overview.mdx
@@ -6,22 +6,26 @@ description: >
   Create a better user experience with the most accurate live transcription and subtitling.
 ---
 
-
 <ProductOverview
     title="Real-Time Speech-To-Text"
     img="/images/real-time-stt/real-time-stt.png"
-    apiQuickStartLink="/real-time-stt/get-started/quickstart"
+    quickStartLink="/real-time-stt/get-started/quickstart"
     apiReferenceLink="/api-reference"
     samplesLink="https://github.com/AgoraIO/agora-rtt-server"
     productFeatures={[
         {
             title: "Live transcription for RTC",
             content: "Integrated with Agora’s voice and video service, live transcription and captions improve accessibility for your audience. Perfect for meetings, live streaming, lectures, interviews, live shopping, and more.",
-            link: ""
+            link: "../get-started/quickstart"
         },
+        {
+            title: "Real-time translation",
+            content: "Break down language barriers with live speech-to-text translation to multiple languages during real-time communication or live streaming. The high accuracy translation text, delivered with ultra low latency, can be integrated with LLMs for enhanced capabilities.",
+            link: "../develop/translation"
+        },        
         {
             title: "Cloud-based STT",
-            content: "Cloud-based service converts voice to text based on the active or specific hosts, then distributes the text to all participants in the channel for further processing. Does not depend on the performance of the client device and network.",
+            content: "Cloud-based service converts voice to text for active or specific hosts and then distributes the text to all participants in the channel for further processing. The service does not depend on the client's device performance and network conditions.",
             link: ""
         },
         {
@@ -31,23 +35,18 @@ description: >
         },
         {
             title: "Caption recording",
-            content: "Upload the transcriptions as .vtt files to cloud storage, then play back audio or video recordings with closed captions (CC). The timestamps in the .vtt file ensure that the text is perfectly synchronized with the audio or video, so it appears exactly where it was generated. ",
-            link: ""
+            content: "Upload the transcriptions as .vtt files to cloud storage, then play back audio or video recordings with closed captions (CC). The timestamps in the .vtt file ensure that the text is perfectly synchronized with the audio or video, so that it appears exactly where it was generated.",
+            link: "../develop/record-captions"
         },
         {
             title: "Multi-language support",
-            content: "Real-time transcription supports all major languages and dialects, and each channel can support audio to text transcription for up to two languages simultaneously.",
-            link: ""
-        },
-        {
-            title: "Enterprise-grade security and compliance",
-            content: "Agora is ISO and SOC 2 certified and meets compliance standards for regional privacy laws and industry regulations, including GDPR, CCPA, and HIPAA. Live captions and transcription can be encrypted the same way as the RTC audio or video.  ",
-            link: ""
+            content: "Real-time transcription supports all major languages and dialects, and each channel can support audio to text transcription for up to two languages simultaneously. Real-time translation supports translation of up to two source languages into five target languages with support for 30+ languages.",
+            link: "../develop/supported-languages"
         },
     ]}
 
 >
 
-Agora Real-Time Speech-To-Text (STT) enables you to transcribe the voice stream of each host to provide live closed captions (CC) and transcription for improved accessibility. Using its advanced features, you can also remove silent audio segments to optimize transcription performance and reduce costs. The output text can be further processed as input for large language models, such as GPT. Real-Time STT serves as a gateway for real-time engagement to enter the AI arena.
+Agora Real-Time Speech-To-Text (STT) enables you to transcribe the voice stream of each host to provide live closed captions (CC) and transcription for improved accessibility. Its advanced features, remove silent audio segments to optimize transcription performance and reduce costs. The output text can be translated to multiple languages and further processed as input for large language models, such as GPT. Real-Time STT serves as a gateway for real-time engagement to enter the AI arena.
 
 </ProductOverview>
diff --git a/shared/real-time-stt/develop/supported-languages.mdx b/shared/real-time-stt/develop/supported-languages.mdx
@@ -1,36 +1,36 @@
 <Vpd k="NAME"/> supports the following languages:
 
-| GUI Language | STT language parameters |
-|---|---|
-| Arabic (EG)  | ar-EG |
-| Arabic (JO)  | ar-JO |
-| Arabic (SA)  | ar-SA |
-| Arabic (UAE)  | ar-AE |
-| Bengali (IN)  | bn-IN |
-| Chinese | zh-CN |
-| Chinese (HK)  | zh-HK |
-| Chinese (TW)  | zh-TW |
-| Dutch | nl-NL |
-| English (IN) | en-IN |
-| English (US)  | en-US |
-| Filipino | fil-PH |
-| French | fr-FR |
-| German | de-DE |
-| Gujarati   | gu-IN |
-| Hebrew  | he-IL |
-| Hindi | hi-IN |
-| Indonesian | id-ID |
-| Italian | it-IT |
-| Japanese | ja-JP |
-| Kannada   | kn-IN |
-| Korean | ko-KR |
-| Malay | ms-MY |
-| Persian | fa-IR |
-| Portuguese | pt-PT |
-| Russian | ru-RU |
-| Spanish | es-ES |
-| Tamil  | ta-IN |
-| Telugu   | te-IN |
-| Thai | th-TH |
-| Turkish | tr-TR |
-| Vietnamese   | vi-VN |
+| Language | Parameter |
+|:---|:---|
+| Arabic (EG)  | `ar-EG` |
+| Arabic (JO)  | `ar-JO` |
+| Arabic (SA)  | `ar-SA` |
+| Arabic (UAE)  | `ar-AE` |
+| Bengali (IN)  | `bn-IN` |
+| Chinese | `zh-CN` |
+| Chinese (HK)  | `zh-HK` |
+| Chinese (TW)  | `zh-TW` |
+| Dutch | `nl-NL` |
+| English (IN) | `en-IN` |
+| English (US)  | `en-US` |
+| Filipino | `fil-PH` |
+| French | `fr-FR` |
+| German | `de-DE` |
+| Gujarati   | `gu-IN` |
+| Hebrew  | `he-IL` |
+| Hindi | `hi-IN` |
+| Indonesian | `id-ID` |
+| Italian | `it-IT` |
+| Japanese | `ja-JP` |
+| Kannada   | `kn-IN` |
+| Korean | `ko-KR` |
+| Malay | `ms-MY` |
+| Persian | `fa-IR` |
+| Portuguese | `pt-PT` |
+| Russian | `ru-RU` |
+| Spanish | `es-ES` |
+| Tamil  | `ta-IN` |
+| Telugu   | `te-IN` |
+| Thai | `th-TH` |
+| Turkish | `tr-TR` |
+| Vietnamese   | `vi-VN` |