Skip to content

Rutgers ECE capstone(2019): Multilingual ASR data collection

Notifications You must be signed in to change notification settings

ciuji/RU_capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RU_capstone

Rutgers ECE capstone(2019): Multilingual ASR data collection

Introduction

Crawl multilingual audio and text reasources from web, achieve forced alignment on those data.

There would be two part of our project, the first is Crawler, the second is Aligner.

Crawler

In this part, we achieved web crawling on two website. We crawled multilanguage audio and corresponding text data.

WordProject

WordProject is a website that provide multilingual version of Bible. Actually, it support 37 languages. The reasources from this website have a perfect match rate.

SBS News

SBS News is a news website that provide news in over 60 kinds of languages.

Aligner

In this part, we achieved forced alignment based on Montreal-Forced-Aligner and Kaldi using the data we crawled before.

Our output would be TextGrid format files.

TextGrid demo:

TextGrid photo

Video Demo

demo

Team Member

Mo Shi, Chaoji Zuo, Ziqi Wang, Zekun Zhang, Duc Le

About

Rutgers ECE capstone(2019): Multilingual ASR data collection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published