Skip to content

Pytorch implementation of Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

License

Notifications You must be signed in to change notification settings

wngh1187/Diff-SV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diff-SV

Pytorch code for following paper:

  • Title : Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models (Accepted for ICASSP 2024, available here)
  • Autor : Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim and Ha-Jin Yu

Abstract

Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems.

Prerequisites

Environment Setting

  • We used 'nvcr.io/nvidia/pytorch:21.04-py3' image of Nvidia GPU Cloud for conducting our experiments.
  • Run 'build.sh' file to make docker image
./docker/build.sh
  • Run 'interactive.sh' file to activate docker container
  • Note that you must modify the mapping path before running the 'interactive.sh' file
./docker/interactive.sh

Datasets

  • We used VoxCeleb1 dataset for training and test.
  • For noisy test, we used the MUSAN, Nonspeech100, and VOiCES datasets.
  • Each downloaded dataset should be mapped to the 'data' folder in docker environment.

Train and test

python3 code/diff_sv/main.py 

Citation

Please cite this paper if you make use of the code.

@article{kim2023diff,
  title={Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models},
  author={Kim, Ju-ho and Heo, Jungwoo and Shin, Hyun-seo and Lim, Chan-yeong and Yu, Ha-Jin},
  journal={arXiv preprint arXiv:2309.08320},
  year={2023}
}

About

Pytorch implementation of Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages