-
Notifications
You must be signed in to change notification settings - Fork 3
Get the language right
pyIIIFpres checks the language subtags of labels, summaries and other text content against the language subtag registry.
NOTE: only language subtags are checked not variants or composite strings.
In this registry, there are more than 190 two-letters subtags and 8022 three-letters subtags, hence you have 28% chance that inserting a random two-letter string will result in a valid subtag and 45% chance that inserting a random three-letters string will result in a valid three-letter tag.
You might want to limit the check to a subset of languages you know are in your document to avoid these errors. This can be achieved by reassigning the LANGUAGES
global variable:
from IIIFpres import iiifpapi3,BCP47lang
iiifpapi3.LANGUAGES = [BCP47lang.english,BCP47lang.spanish]
# all the rest of your script
pyIIIFpres allows only language subtags. If you think that a single sub-tag is not enough for describing the language of the document you can add your custom language string in this way:
from IIIFpres import iiifpapi3,BCP47lang
iiifpapi3.LANGUAGES.append("de-DE-u-co-phonebk")
# all the rest of your script
But keep in mind the golden W3C golden rule:
Always bear in mind that the golden rule is to keep your language tag as short as possible. Only add further subtags to your language tag if they are needed to distinguish the language from something else in the context where your content is used.
(before inserting you could check it using, for instance, https://schneegans.de/lv/)
Remember that add_metadata
and set_requiredStatement
if left empty return a lanaguagemap
object that can help building multilanguage support.
reqst = manifest.set_requiredStatement()
reqst.add_label('Provided by','en')
reqst.add_value('Univeristy of Verona','en')
reqst.add_label('Contenuto fornito da','it')
reqst.add_value('Università di Verona','it')
Another possible approach could be to use a language detector. There are many different alternatives to accomplish this task, this StackOverflow answer gives a good overview of the panorama.
This example shows a basic implementation using langdetect
.
The output of using the language detector on the manifest iiifpapi3 object of 0065-opera-multiple-canvases recipe is the following:
In [3]: check_languages(manifest)
❌ L'Elisir D'Amore seems not to be: it
✅ The Elixir of Love is : en
✅ Date Issued is : en
⚠️ Could not detect language for 2019 but is set to: en
✅ Publisher is : en
✅ Indiana University Jacobs School of Music is : en
❌ Atto Primo seems not to be: en
❌ Atto Secondo seems not to be: en
✅ Gaetano Donizetti, L'Elisir D'Amore is : it
❌ Atto Primo seems not to be: en
✅ Preludio e Coro d'introduzione – Bel conforto al mietitore is : it
✅ Remainder of Atto Primo is : en
❌ Atto Secondo seems not to be: en