Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS Tagger with Stanza pipeline #2

Open
dxv2k opened this issue Dec 5, 2020 · 0 comments
Open

POS Tagger with Stanza pipeline #2

dxv2k opened this issue Dec 5, 2020 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@dxv2k
Copy link
Collaborator

dxv2k commented Dec 5, 2020

DATA ANNOTATION: Penn Treebank 2 format

WORK FLOW with Stanza:
Document (in CoNLLU format or perform conversion ) -> Setence Segmentation -> Tokenize and Multi-word Tokenize-> POS Tagging

NOTICE:

  • We work with English so no need for multi-word tokenization (MWT)
  • In order to use Stanza, data format must be in CoNLLU
  • Use pre-trained tokenization and POS Tagger to compare with Viterbi Algorithm (XPOS field)
  • Neural pipeline in Stanza: Maximum entropy cyclic dependency network

Essential libraries and other components will be listed later.

@dxv2k dxv2k added the documentation Improvements or additions to documentation label Dec 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant