A novel approach to generate word embeddings for Turkish language #Acıkhack2021
Data:
-
"42 bin haber" from http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html
-
"69 yazar" from http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html
-
"270 köşeyazısı" from http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html
-
"630 köşeyazısı" from http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html
-
"1150 haber" from http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html
-
"Old Newspapers" from https://www.kaggle.com/alvations/old-newspapers (Turkish part of the dataset)
-
"Turkish wiki dump" from https://www.kaggle.com/mustfkeskin/turkish-wikipedia-dump
Used external libraries:
- pandas
- Corpus-py from https://github.com/StarlangSoftware/Corpus-Py