LanguageNet Grapheme-to-Phoneme Transducers

Usage

Download and install Phonetisaurus
Get your own copy of the G2Ps: git clone https://github.com/uiuc-sst/g2ps
Test the installation: phonetisaurus-g2pfst --model=g2ps/models/akan.fst --word=ahyiakwa You should see the answer "ahyiakwa 21.7336 a ç i a ɥ a˥".

Description

The column "FSTs" is a trained grapheme-to-phoneme transducer for use with phonetisaurus. If the available lexicons were large enough to test the phone error rate (PER), then it is listed in parentheses. As of this writing, PERs range from 7% to 45%. Note: some of the trained models exceed github's file size limit, so they're not available on the github page; instead, you can find them at http://speechtechnology.web.illinois.edu/data/g2ps/ Currently those are (american-english, arabic, dutch, french, german, portuguese, russian, spanish, turkish).
The column "Pronlexes" lists pronunciation lexicons distributed on this site; most are just short symbol tables, but a few are longer.
Other columns are just pointers to sources.

Acknowledgments

This project was funded from 2016-2019 as part of the LanguageNet. Phonetisaurus G2Ps were trained using the lexicons listed here, and the lexicons in the LanguageNet. Some languages had other sources: Appen BABEL lexicons (amharic, assamese, bengali, cebuano, georgian, guarani, haitian, igbo, javanese, kurdish, lao, lithuanian, luo, mongolian, pushto, swahili, tagalog, tamil, tok-pisin, turkish, vietnamese, yue, zulu), CELEX (dutch, english, german), CALLHOME (egyptian-arabic, mandarin, spanish).

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
Achinese		Achinese
Ainu		Ainu
Akan		Akan
Albanian		Albanian
Amharic		Amharic
Armenian		Armenian
Assamese		Assamese
Assyrian_Neo-Aramaic		Assyrian_Neo-Aramaic
Aymara		Aymara
Azerbaijani		Azerbaijani
Balinese		Balinese
Bambara		Bambara
Belarusian		Belarusian
Bengali		Bengali
Bosnian		Bosnian
Buginese		Buginese
Bulgarian		Bulgarian
Burmese		Burmese
Cebuano		Cebuano
Central_Atlas_Tamazight		Central_Atlas_Tamazight
Central_Khmer		Central_Khmer
Croatian		Croatian
Czech		Czech
Danish		Danish
Dari		Dari
Dinka		Dinka
Dutch		Dutch
Egyptian_Arabic		Egyptian_Arabic
English-US		English-US
Esperanto		Esperanto
Estonian		Estonian
Faroese		Faroese
Fijian		Fijian
Filipino		Filipino
Finnish		Finnish
French		French
Fulah		Fulah
Georgian		Georgian
German		German
Guarani		Guarani
Gujarati		Gujarati
Gulf_Arabic		Gulf_Arabic
Haitian		Haitian
Hausa		Hausa
Hebrew		Hebrew
Hindi		Hindi
Hmong-Dô		Hmong-Dô
Hmong		Hmong
Hungarian		Hungarian
Igbo		Igbo
Indonesian		Indonesian
Iranian_Persian		Iranian_Persian
Italian		Italian
Japanese		Japanese
Javanese		Javanese
Kannada		Kannada
Kazakh		Kazakh
Kikuyu		Kikuyu
Kinyarwanda		Kinyarwanda
Kirghiz		Kirghiz
Kongo		Kongo
Korean		Korean
Kurdish		Kurdish
Lao		Lao
Latin		Latin
Latvian		Latvian
Lithuanian		Lithuanian
Luba-Lulua		Luba-Lulua
Luo		Luo
Luxembourgish		Luxembourgish
Macedonian		Macedonian
Malagasy		Malagasy
Malayalam		Malayalam
Maltese		Maltese
Mandarin_Chinese		Mandarin_Chinese
Mandinka		Mandinka
Maori		Maori
Marathi		Marathi
Min-Nan		Min-Nan
Minangkabau		Minangkabau
Modern_Greek		Modern_Greek
Mongolian		Mongolian
Moroccan_Arabic		Moroccan_Arabic
Ndebele		Ndebele
Nepali		Nepali
North_Levantine_Arabic		North_Levantine_Arabic
Norwegian_Bokmål		Norwegian_Bokmål
Nyanja		Nyanja
Occitan		Occitan
Oromo		Oromo
Ossetian		Ossetian
Pampanga		Pampanga
Panjabi		Panjabi
Papiamento		Papiamento
Polish		Polish
Portuguese		Portuguese
Pushto		Pushto
Quechua		Quechua
Rarotongan		Rarotongan
Rohingya		Rohingya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LanguageNet Grapheme-to-Phoneme Transducers

Usage

Description

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

uiuc-sst/g2ps

Folders and files

Latest commit

History

Repository files navigation

LanguageNet Grapheme-to-Phoneme Transducers

Usage

Description

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages