We are working on build a open source spellchecker for Tamil Language using Open-Tamil python library.
We found that we need all the nouns in tamil for quick checking and validating.
It seems our world is full of nouns.
In this repo, we are collecting all the nouns as much possbile.
Read our explorations of building Tamil spellchecker here - https://goinggnu.wordpress.com/category/spellchecker/
nouns - 97875
peryan.in_names/boy - 20391
peyar.in_names/girl - 24030
random_collections - 1115
tamilsurangam.in - 1249
wiktionary - 85256
total - 2,29,916 (all_nouns.txt)
only unique_nouns - 1,92,122 (unique_all_nouns.txt)
===
Further removed the unique sub names and made this file unique_sorted_noun_master.txt.
Will be using this file as a master list for nouns.
wc -l unique_sorted_noun_master.txt
1,53,548 unique_sorted_noun_master.txt
===
Read more about this repo here - https://goinggnu.wordpress.com/2020/05/24/building-tamil-spellchecker-day-3-collecting-all-tamil-nouns/
- Collect more nouns and add in this repo.
- Check for any errors and fix them in these files.
- Collect all verbs and other forms in tamil too.