Gán nhãn từ loại Tiếng Việt

Tách từ Tiếng Việt với thuật toán Longest Matching
Gán nhãn từ loại sử dụng mô hình Hidden Markov kết hợp thuật toán Viterbi
So sánh kết quả với thư viện VnCoreNLP

Download tập tin CoreNLP.zip, giải nén vào thư mục, ví dụ: D:\VnCoreNLP
Chạy VnCoreNLPServer:
- Mở cmd
- Chuyển đến thư mục D:\VnCoreNLP
- Chạy chương trình: java -Xmx2g -jar VnCoreNLPServer.jar VnCoreNLP-1.1.jar -p 9001 -a "wseg,pos,parse"
Cài đặt thư viện VnCoreNLP trên Python: pip install vncorenlp

Tạo đối tượng kết nối với VnCoreNLPServer:

from vncorenlp import VnCoreNLP
client = VnCoreNLP(address="http://127.0.0.1", port=9001)

Tách từ cho một văn bản text, kết quả là danh sách các từ:
```
wordlist = client.tokenize(text)
```
Gán nhãn từ loại cho văn bản text, kết quả là danh sách các bộ (word, pos) trong đó word là từ đã được tách và pos là nhãn từ loại tương ứng với nó:
```
tagresult = client.pos_tag(text)
```

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
dataset		dataset
resources		resources
tokenize		tokenize
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
source.ipynb		source.ipynb

Provide feedback