Markerless pose estimation of speech articulators from ultrasound tongue images and lip video

These videos show the performance of the model on speakers that were not included in the training set. The video below also shows the performance on an ultrasound system, probe geometry and framerate which were not represented in the training set.

The ultrasound model estimates the position of 11 keypoints along the tongue surface plus a further 3 keypoints on the hyoid, base of the mandible and mental spine where short tendon attaches to mandible.

(The video above was made in AAA, software for speech articulatory analysis and recording by Articulate Instruments, using the pose-estimation models in this project trained using DeepLabCut (Mathis, A., Mamidanna, P., Cury, K.M. et al.). The video below was created using DeepLabCut's built-in video export).

How to use this project

To download all the files needed to run this project, you can clone this repository:

git clone https://github.com/articulateinstruments/DeepLabCut-for-Speech-Production.git

or click this link to download the project as a .zip file. (737 MB download / 1.48 GB on disk)
Click here for instructions on how to install DeepLabCut and run this project. (DeepLabCut will be 2.97 GB on disk)
Click here for instructions on how to use this project to analyse data. Note: Shuffle2 Lip and Ultrasound models are trained using revised labelling and significantly more images from new recordings. Results are best with these models.

Both guides contain detailed walk-throughs for people who are new to using DeepLabCut.

You do not need a GPU in your computer to use these models: you should be able to run this project on most PCs. If you have a powerful GPU then you can use it with this project to analyse data significantly faster.

What this project contains

This repository contains:

6 pre-trained models that are ready to use, specifically:
- 3 Ultrasound tongue surface, mandible, hyoid and short-tendon tracking models, for use on midsaggital ultrasound videos where the tongue tip is to the right.
- 3 Lip tracking models, for use on front-facing videos of human lips.
1 set of hand-labeled Ultrasound training data.
1 set of hand-labeled Lip training data.
1 set of hand-labeled Ultrasound test data.
1 set of hand-labeled Lip test data.

Authors

This research using DeepLabCut for speech production is by Wrench, A. and Balch-Tomes, J. (2022) (10.3390/s22031133).

DeepLabCut software was developed by Mathis, A., Mamidanna, P., Cury, K.M. et al. (2018) (10.1038/s41593-018-0209-y) with additional software by Nath, T., Mathis, A. et al. (2019) (10.1038/s41596-019-0176-0) and Mathis, A., Biasi T. et al. (2021)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Dependencies		Dependencies
Installation_Instructions		Installation_Instructions
Lips		Lips
Ultrasound		Ultrasound
.gitignore		.gitignore
INSTRUCTIONS.md		INSTRUCTIONS.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Markerless pose estimation of speech articulators from ultrasound tongue images and lip video

How to use this project

What this project contains

Authors

About

Contributors 2

Languages

License

articulateinstruments/DeepLabCut-for-Speech-Production

Folders and files

Latest commit

History

Repository files navigation

Markerless pose estimation of speech articulators from ultrasound tongue images and lip video

How to use this project

What this project contains

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages