Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Fix CommonVoice dataset for Speech Recognition problem #1852

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

RegaliaXYZ
Copy link

Fixed Common Voice data generator by adding a flag to the datagen.py file for the language code (--language="en", if not specified it defaults to english) and dynamically downloading the correct language dataset.

Also had to rework the architecture of the data unpacking since Mozilla changed their folder architecture.

Also removed the sub-problems of Common Voice (Noisy, Clean, FullTestClean since all the previous .tsv files were merged (no more other-train, other-test etc)

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no PR author has not signed CLA label Sep 17, 2020
@RegaliaXYZ
Copy link
Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes PR author has signed CLA and removed cla: no PR author has not signed CLA labels Sep 17, 2020
…by using DictReader

Fixed the relative check not working during extraction
Removed unnecessary collect_data function since there's no need of it anymore
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes PR author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants