Skip to content

A bunch of modules that use/extend CLTK in order to work with Greek and Latin corpora maintained by the Perseus DL

Notifications You must be signed in to change notification settings

francescomambrini/PerseusNLPToolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

What is this?

It's a collections of small python modules made to enhance the interoperability ot the awesome NLTK and CLTK with CTS-compatible texts that follow the guidelines of Capitains, and especially those of the Perseus DL and First 1K Years of Greek.

At the moment, I have a put together:

  • a corpus reader (see here for an introduction to NLTK corpus readers) for Capitains-compliant XML files. It works with all the First1K texts that you can download using CLTK downloader. It lets you load and tokenize your corpus and store citations for all your tokens.
  • a Greek tokenizer (in progress) that should work well with Perseus treebank (I am still testing...)
  • a concordance indexer to create (enhanced!) concordances from CTS-compatible texts
  • a class for full morphology tagging of Greek and to lemmatize tagged texts (see here)

About

A bunch of modules that use/extend CLTK in order to work with Greek and Latin corpora maintained by the Perseus DL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published