You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Cocina model has a field for langaugeTag where users can specify the language of a file. Currently we only use that field to drive the caption display interface.
For generating new captions, we want users to be able to specify a language for Whisper to use for transcription. Since a media item could have multiple files in different languages, we need to be able to specify this language on a per-file basis. We propose to use the languageTag on each media file to-be-captioned for this purpose.
Example:
In this QA item, the language has been set to English on the audio file that would be sent to Whisper. The idea is for Whisper to use that language when generating captions.
Users are already able to edit the language field using the file_manifest.csv in Preassembly or the Argo structural metadata editing that uses the same CSV format. So in the near term we do not need to add a UI for language specification.
Logic
Using the existing logic to determine which media files to caption, look for the languageTag on those items. Any other files can be ignored.
Have whisper try to transcribe in that langauge
Apply the same language value to the VTT/TXT files that come back from Whisper so that it shows up in the caption display UI
If no language is specified, auto-detect (what we currently do)
Additional information
This differs from the OCR approach because in OCR we:
are able to select multiple languages for detection
ABBYY (and presumably other OCR tools) detect language in smaller chunks within an item, not on a per-file basis
set the language at the whole-item level, not per-file
Whisper can only be given a single language.
The text was updated successfully, but these errors were encountered:
The Cocina model has a field for
langaugeTag
where users can specify the language of a file. Currently we only use that field to drive the caption display interface.For generating new captions, we want users to be able to specify a language for Whisper to use for transcription. Since a media item could have multiple files in different languages, we need to be able to specify this language on a per-file basis. We propose to use the
languageTag
on each media file to-be-captioned for this purpose.Example:
In this QA item, the language has been set to English on the audio file that would be sent to Whisper. The idea is for Whisper to use that language when generating captions.
Users are already able to edit the language field using the
file_manifest.csv
in Preassembly or the Argo structural metadata editing that uses the same CSV format. So in the near term we do not need to add a UI for language specification.Logic
languageTag
on those items. Any other files can be ignored.Additional information
This differs from the OCR approach because in OCR we:
Whisper can only be given a single language.
The text was updated successfully, but these errors were encountered: