-
-
Notifications
You must be signed in to change notification settings - Fork 7
Models
UralicNLP can currently use three different kinds of models: HFST morphological generator, HFST morphological analyser and constraint grammar disambiguator. The HFST models are available for all the supported languages, while the CGs are limited to only a few languages.
The models originate mostly from the Giellatekno repository and Apertium. Their copyrights belong to the respective authors, however everything provided by Giellatekno and Apertium is open source.
from uralicNLP import uralicApi
uralicApi.download("fin")
The above snippet downloads all the models for Finnish. Run with sudo privileges for a system wide installation.
from uralicNLP import uralicApi
print uralicApi.__model_base_folders()
Gives you the list of the possible locations for the models. If you want to create your own models, just create a subdirectory in any of these locations by the three letter language code of your language. Name your models as generator, analyser and cg without file extensions.
If you want to free up some space, or end up getting confused which models will be loaded when uralicNLP is used, you can also uninstall models easily
from uralicNLP import uralicApi
uralicApi.uninstall("fin")
It is possible to use your own transducer file on uralicNLP by passing a filename parameter
from uralicNLP import uralicApi
uralicApi.generate("kissa+N+Pl+Nom", "fin", filename="/path_to_your/transducer.hfstol")
uralicApi.analyze("kissat", "fin", filename="/path_to_your/transducer.hfstol")
uralicApi.lemmatize("kissat", "fin", filename="/path_to_your/transducer.hfstol")
Use uralicApi.model_info(language) to see information about the FSTs and CGs such as license and authors. If you know how to make this information more accurate, please don't hesitate to open an issue on GitHub.
from uralicNLP import uralicApi
uralicApi.model_info("fin")
If you need to get a lower level access to the HFST transducer object, you can use the following code
from uralicNLP import uralicApi
sms_generator = uralicApi.get_transducer("sms", analyzer=False) #generator
sms_analyzer = uralicApi.get_transducer("sms", analyzer=True) #analyzer
The same parameters can be used here as for generate() and analyze() to specify whether you want to use the normative or descriptive analyzers and so on. The defaults are get_transducer(language, cache=True, analyzer=True, descriptive=True, dictionary_forms=True).
UralicNLP is an open-source Python library by Mika Hämäläinen