libri light dataset #698

loganhart02 · 2021-08-03T00:43:57Z

loganhart02
Aug 3, 2021

Hey guys! I was looking for some asr datasets for a conformer model and I stumbled upon the libri light dataset and I think it could be something to use to help train a speaker encoder and wanted to get your thoughts or if you've seen it yet. The main dataset is 60k hours of unlabeled audio with speaker ID labels. I'm not sure if we need the audio transcripts for speaker encoder training but my intuition is that you only need the audio and the speaker Id. the main dataset has 7439 speakers.
Here is the link to the dataset https://github.com/facebookresearch/libri-light
the link to the paper https://arxiv.org/pdf/1912.07875.pdf
and Ill drop a screenshot below of the different dataset sizes

erogol · 2021-08-03T08:48:02Z

erogol
Aug 3, 2021
Maintainer

Thanks for the link. I think @Edresson trained a speaker encoder (yet to release) on that

0 replies

AbubakarAliyuBadawi · 2024-06-18T10:02:20Z

AbubakarAliyuBadawi
Jun 18, 2024

@loganhart02 Please do you know how I can get the transcript of the Libri Light large 60k hours? I want to see if synthesizing the dataset will improve the semantic performance of my SLM

0 replies

loganhart02 · 2024-06-18T14:25:56Z

loganhart02
Jun 18, 2024
Author

look for libriheavy dataset it's libri light but with GT text from the actual books

…

On Tue, Jun 18, 2024 at 6:02 AM Abubakar Aliyu Badawi < ***@***.***> wrote: @loganhart02 <https://github.com/loganhart02> Please do you know how I can get the transcript of the Libri Light large 60k hours? I want to see if synthesizing the dataset will improve the semantic performance of my SLM — Reply to this email directly, view it on GitHub <#698 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARHT2MQ5L2AQJ3LUXMTIXV3ZIAAUDAVCNFSM6AAAAABJPWCJWSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQMBUGQYTG> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libri light dataset #698

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

libri light dataset #698

loganhart02 Aug 3, 2021

Replies: 3 comments

erogol Aug 3, 2021 Maintainer

AbubakarAliyuBadawi Jun 18, 2024

loganhart02 Jun 18, 2024 Author

loganhart02
Aug 3, 2021

erogol
Aug 3, 2021
Maintainer

AbubakarAliyuBadawi
Jun 18, 2024

loganhart02
Jun 18, 2024
Author