Skip to content

Trained deep neural-net models for estimating articulatory keypoints from midsagittal ultrasound tongue videos and front-view lip camera videos using DeepLabCut. This research is by Wrench, A. and Balch-Tomes, J. (2022) (https://www.mdpi.com/1424-8220/22/3/1133) (https://doi.org/10.3390/s22031133).

License

Notifications You must be signed in to change notification settings

articulateinstruments/DeepLabCut-for-Speech-Production

Repository files navigation

Markerless pose estimation of speech articulators from ultrasound tongue images and lip video

Speaker 20fs in the test set, included with this project

These videos show the performance of the model on speakers that were not included in the training set. The video below also shows the performance on an ultrasound system, probe geometry and framerate which were not represented in the training set.

The ultrasound model estimates the position of 11 keypoints along the tongue surface plus a further 3 keypoints on the hyoid, base of the mandible and mental spine where short tendon attaches to mandible.

(The video above was made in AAA, software for speech articulatory analysis and recording by Articulate Instruments, using the pose-estimation models in this project trained using DeepLabCut (Mathis, A., Mamidanna, P., Cury, K.M. et al.). The video below was created using DeepLabCut's built-in video export).

Speaker DF in the test set, included with this project

How to use this project

  1. To download all the files needed to run this project, you can clone this repository:

    git clone https://github.com/articulateinstruments/DeepLabCut-for-Speech-Production.git

    or click this link to download the project as a .zip file. (737 MB download / 1.48 GB on disk)

  2. Click here for instructions on how to install DeepLabCut and run this project. (DeepLabCut will be 2.97 GB on disk)

  3. Click here for instructions on how to use this project to analyse data. Note: Shuffle2 Lip and Ultrasound models are trained using revised labelling and significantly more images from new recordings. Results are best with these models.

Both guides contain detailed walk-throughs for people who are new to using DeepLabCut.

You do not need a GPU in your computer to use these models: you should be able to run this project on most PCs. If you have a powerful GPU then you can use it with this project to analyse data significantly faster.

What this project contains

This repository contains:

Authors

This research using DeepLabCut for speech production is by Wrench, A. and Balch-Tomes, J. (2022) (10.3390/s22031133).

DeepLabCut software was developed by Mathis, A., Mamidanna, P., Cury, K.M. et al. (2018) (10.1038/s41593-018-0209-y) with additional software by Nath, T., Mathis, A. et al. (2019) (10.1038/s41596-019-0176-0) and Mathis, A., Biasi T. et al. (2021)

About

Trained deep neural-net models for estimating articulatory keypoints from midsagittal ultrasound tongue videos and front-view lip camera videos using DeepLabCut. This research is by Wrench, A. and Balch-Tomes, J. (2022) (https://www.mdpi.com/1424-8220/22/3/1133) (https://doi.org/10.3390/s22031133).

Topics

Resources

License

Stars

Watchers

Forks