Skip to content
Mika Hämäläinen edited this page Nov 27, 2024 · 9 revisions

What models are there?

Model Catalogue

UralicNLP can currently use three different kinds of models: HFST morphological generator, HFST morphological analyser and constraint grammar disambiguator. The HFST models are available for all the supported languages, while the CGs are limited to only a few languages.

The models originate mostly from the Giellatekno repository and Apertium. Their copyrights belong to the respective authors, however everything provided by Giellatekno and Apertium is open source.

Downloading models

from uralicNLP import uralicApi
uralicApi.download("fin")

The above snippet downloads all the models for Finnish. Run with sudo privileges for a system wide installation.

Where are models located?

from uralicNLP import uralicApi
print uralicApi.__model_base_folders()

Gives you the list of the possible locations for the models. If you want to create your own models, just create a subdirectory in any of these locations by the three letter language code of your language. Name your models as generator, analyser and cg without file extensions.

Uninstalling models

If you want to free up some space, or end up getting confused which models will be loaded when uralicNLP is used, you can also uninstall models easily

from uralicNLP import uralicApi
uralicApi.uninstall("fin")

Using your own transducers

It is possible to use your own transducer file on uralicNLP by passing a filename parameter

from uralicNLP import uralicApi
uralicApi.generate("kissa+N+Pl+Nom", "fin", filename="/path_to_your/transducer.hfstol")
uralicApi.analyze("kissat", "fin", filename="/path_to_your/transducer.hfstol")
uralicApi.lemmatize("kissat", "fin", filename="/path_to_your/transducer.hfstol")

Model info

Use uralicApi.model_info(language) to see information about the FSTs and CGs such as license and authors. If you know how to make this information more accurate, please don't hesitate to open an issue on GitHub.

from uralicNLP import uralicApi
uralicApi.model_info("fin")

Access the HFST transducer

If you need to get a lower level access to the HFST transducer object, you can use the following code

from uralicNLP import uralicApi
sms_generator = uralicApi.get_transducer("sms", analyzer=False) #generator
sms_analyzer = uralicApi.get_transducer("sms", analyzer=True) #analyzer

The same parameters can be used here as for generate() and analyze() to specify whether you want to use the normative or descriptive analyzers and so on. The defaults are get_transducer(language, cache=True, analyzer=True, descriptive=True, dictionary_forms=True).