These videos show the performance of the model on speakers that were not included in the training set. The video below also shows the performance on an ultrasound system, probe geometry and framerate which were not represented in the training set.
The ultrasound model estimates the position of 11 keypoints along the tongue surface plus a further 3 keypoints on the hyoid, base of the mandible and mental spine where short tendon attaches to mandible.
(The video above was made in AAA, software for speech articulatory analysis and recording by Articulate Instruments, using the pose-estimation models in this project trained using DeepLabCut (Mathis, A., Mamidanna, P., Cury, K.M. et al.). The video below was created using DeepLabCut's built-in video export).
-
To download all the files needed to run this project, you can clone this repository:
git clone https://github.com/articulateinstruments/DeepLabCut-for-Speech-Production.git
or click this link to download the project as a .zip file. (737 MB download / 1.48 GB on disk)
-
Click here for instructions on how to install DeepLabCut and run this project. (DeepLabCut will be 2.97 GB on disk)
-
Click here for instructions on how to use this project to analyse data. Note: Shuffle2 Lip and Ultrasound models are trained using revised labelling and significantly more images from new recordings. Results are best with these models.
Both guides contain detailed walk-throughs for people who are new to using DeepLabCut.
You do not need a GPU in your computer to use these models: you should be able to run this project on most PCs. If you have a powerful GPU then you can use it with this project to analyse data significantly faster.
This repository contains:
- 6 pre-trained models that are ready to use, specifically:
- 3 Ultrasound tongue surface, mandible, hyoid and short-tendon tracking models, for use on midsaggital ultrasound videos where the tongue tip is to the right.
- 3 Lip tracking models, for use on front-facing videos of human lips.
- 1 set of hand-labeled Ultrasound training data.
- 1 set of hand-labeled Lip training data.
- 1 set of hand-labeled Ultrasound test data.
- 1 set of hand-labeled Lip test data.
This research using DeepLabCut for speech production is by Wrench, A. and Balch-Tomes, J. (2022) (10.3390/s22031133).
DeepLabCut software was developed by Mathis, A., Mamidanna, P., Cury, K.M. et al. (2018) (10.1038/s41593-018-0209-y) with additional software by Nath, T., Mathis, A. et al. (2019) (10.1038/s41596-019-0176-0) and Mathis, A., Biasi T. et al. (2021)