What is this?

It's a collections of small python modules made to enhance the interoperability ot the awesome NLTK and CLTK with CTS-compatible texts that follow the guidelines of Capitains, and especially those of the Perseus DL and First 1K Years of Greek.

At the moment, I have a put together:

a corpus reader (see here for an introduction to NLTK corpus readers) for Capitains-compliant XML files. It works with all the First1K texts that you can download using CLTK downloader. It lets you load and tokenize your corpus and store citations for all your tokens.
a Greek tokenizer (in progress) that should work well with Perseus treebank (I am still testing...)
a concordance indexer to create (enhanced!) concordances from CTS-compatible texts
a class for full morphology tagging of Greek and to lemmatize tagged texts (see here)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
perseus_nlp_toolkit		perseus_nlp_toolkit
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is this?

About

Releases

Packages

Languages

francescomambrini/PerseusNLPToolkit

Folders and files

Latest commit

History

Repository files navigation

What is this?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages