You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: the model is currently being fine-tuned in the context of my PhD. I'll fill this part when it will be done.
The training set is roughly 1.5M tokens, dev test roughly 10k and test 169822. This is not counting punctuation, as LASLA data are lacking punctuation.
Enclitics are kept in a single token
Enclitic lemma are separated as such token[Caesarque] == lemma[Caesar界que]
Morphology is the morphology of the first token
Only numbers 1, 2 and 3 are known. Roman numbers are unknown.
All punctuation signs are unknown, including the one used in abbr. token[C] == lemma[Gaius]
Lemma and tokens now accept lower and uppercasing. Noise was introduced in the dataset for better results.
Definition to come. POS has no NOMcom vs. NOMpro (needs to be fixed in a later training.)
Task
Accuracy
Accuracy on V != _
lemma
94.73
94.73
Deg
97.64
93.46
Numb
96.84
96.28
Person
99.85
99.75
Mood_Tense_Voice
98.18
93.77
Case
93.05
88.96
Gend
96.29
89.93
pos
65.11
65.11
Model LASLA Plus
The model LASLA+ is trained on additionnal data, creating some noise in the original dataset and resulting in apparently worse results on classical data (approxim. -0.3%). It's results are detailed in LASLA-plus.md.
D. Longrée, C. Philippart de Foy & G. Purnelle. « Structures phrastiques et analyse automatique des données morphosyntaxiques : le projet LatSynt », in S. Bolasco, I. Chiari & L. Giuliano (eds), Statistical Analysis of Textual Data, Proceedings of 10th International Conference Journées d'Analyse statistique des Données Textuelles, 9-11 June 2010, Sapienza University of Rome, Rome, LED, pp. 433-442.
D. Longrée & C. Poudat, « New Ways of Lemmatizing and Tagging Classical and post-Classical Latin: the LATLEM project of the LASLA », in P. Anreiter & M. Kienpointner (éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 683-694.
D. Longrée & C. Philippart de Foy & G. Purnelle, « Subordinate clause boundaries and word order in Latin: the contribution of the L.A.S.L.A. syntactic parser project LatSynt », in P. Anreiter & M. Kienpointner, éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 673-681.
D. Longrée & Poudat C., « Variations langagières et annotation morphosyntaxique du latin classique », TAL, 50 – n° 2/2009, Special issue on "Natural Language Processing and Ancient Languages", pp. 129-148.
E. Manjavacas & Á. Kádár & M. Kestemont, « Improving Lemmatization of Non-Standard Languages with Joint Learning », Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Special issue on "Natural Language Processing and Ancient Languages", 2019, pp. 493--1503.