Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 726 Bytes

README.md

File metadata and controls

11 lines (7 loc) · 726 Bytes

HMM-POS-Tagger

The corpus has been adapted from the Catalan portion of WikiCorpus v. 1.0, as follows:

  • The corpus contains only a selection (< 1.2M words) from the original set.
  • The corpus contains only tokens and parts of speech, not lemmas and word senses.
  • The part-of-speech tags have been simplified from the original, resulting in 29 tags.
  • The format has been changed to the word/TAG format, with each sentence on a separate line.

The corpus is licensed under the same terms as the original, that is, the GNU Free Documentation License (FDL; http://www.fsf.org/licensing/licenses/fdl.html). That means that you are allowed to use and redistribute the texts, provided the derived works keep the same license.