Skip to content

Classify Unicode into 9 classes according to the position of the Korean vowel consonants.

Notifications You must be signed in to change notification settings

twiiks/unicode-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UnicodeClassifier

Synopsis

Traditionally, hangul is devided into six types according to combination method. The example is below. We got one step further. We added one more feature to this. We see two-charactered last consonant and one-charactered last consonant differently. So we divide hangul into nine types and new types are 'FC/W/DLC', 'FC/HV/DLC' and 'FC/HV/W/DLC' when DLC is double Last Consonant.

Usage

UnicodeClassifier can be used as a library for classifying any hangul to 9 character using module UnicodeToKoreanClass placed in file UnicodeToKoreanClass.py this module gets an unicode and return the class number. this can be also used as class counter program using the python file stringToKoreanClass.py. this program gets a hangul stirng as an argument and display how many each nine class is used in that string.

Example

you can just clone this repo and use like below.

$python stringToKoreanClass.py --str="원하는 한글을 입력하세요."

then you will get as a result

Total(kor) : 11
class1(가) : 3
class2(강) : 3
class3(갊) : 0
class4(구) : 1
class5(궁) : 3
class6(굶) : 0
class7(과) : 0
class8(광) : 1
class9(괆) : 0

Requirement

  • python 3.6

About

Classify Unicode into 9 classes according to the position of the Korean vowel consonants.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages