Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmatizer #31

Open
wadid opened this issue Nov 2, 2023 · 3 comments
Open

Lemmatizer #31

wadid opened this issue Nov 2, 2023 · 3 comments

Comments

@wadid
Copy link

wadid commented Nov 2, 2023

Hi, is there something like a lemmatizer? I have a couple of tagalog sentences with translations and I am trying to lemmatize them (then do some sorting by frequency and then use it myself for language learning ;))

@ljvmiranda921
Copy link
Owner

Hi @wadid , this is still something in the works. For context, I will be using spaCy's neural edit-tree lemmatizer for this. I am not sure what my timeline would be, perhaps late December. If you're in a rush, I suggest training your own lemmatizer for now.

Another option is to lemmatize in a rules-based approach. However, that might require more research to the exact lemmatization rules for Tagalog.

@wadid
Copy link
Author

wadid commented Nov 3, 2023

Do you know this project? https://github.com/crlwingen/TagalogStemmerPython
Accuracy rate of 94,12%.
How good is that?

@ljvmiranda921
Copy link
Owner

Hi thanks for this, I think a 94.12% accuracy should be decent given that Tagalog lemmatization rules can be complicated given the agglutinative nature of the language. Right now, I'm trying to port both into calamanCy (rules-based using that stemmer and a neural-based one using spaCy's edit-tree lemmatization).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants