This repo contains the official implementations of the following papers on (signer-independent) continuous sign language recognition (CSLR).
-
[Interspeech 2022] Local Context-aware Self-attention for Continuous Sign Language Recognition. [Paper]
-
[CVPR 2022] C2SLR: Consistency-enhanced Continuous Sign Language Recognition. [Paper]
-
[TOMM 2024] Improving Continuous Sign Language Recognition with Consistency Constraints and Signer Removal. [Paper]
An improved Transformer for temporal modeling in CSLR models. We propose to enhance self-attention at two levels: query computation and attention score modulation. For the latter, we propose a novel dynamic Gaussian bias, whose window size can be adjusted automatically.
Dataset | WER (Dev/Test) | Ckpt&Cfg |
---|---|---|
Phoenix-2014 | 21.4/21.9 | link |
CSL | --/1.4 | link |
Two consistency constraints to boost CSLR model performance. We first leverage pre-extracted keypoints heatmaps to guide an inner attention module in the visual module. Then we align visual and temporal features at the sentence level as a regularization. Both two constraints can improve CSLR model performance with negligible costs.
Dataset | WER (Dev/Test) | Ckpt&Cfg |
---|---|---|
Phoenix-2014 | 20.5/20.4 | link |
Phoenix-2014T | 20.2/20.4 | link |
Phoenix-2014-SI | 34.3/34.4 | link |
CSL | --/0.9 | link |
CSL-Daily | 31.9/31.0 | link |
Existing CSLR works mostly focus on the signer-dependent setting, in which testing signers are all seen during training. However, in the real world, it is infeasible to build a dataset encompassing all signers. In this paper, we propose a signer removal module based on the idea of feature disentanglement. The module is pluggable and can make CSLR models more robust to signer variations.
Dataset | WER (Dev/Test) | Ckpt&Cfg |
---|---|---|
Phoenix-2014-SI | 33.1/32.7 | link |
CSL | --/0.68 | link |
Create an environment and install dependencies.
pip install -r requirements.txt
Download datasets from their websites and place them under the corresponding directories in data/
- Phoenix-2014 Note that Phoenix-2014-SI is just a subset of Phoenix-2014.
- Phoenix-2014T
- CSL
- CSL-Daily
Then unzip all these data and put them into ../../data/
Heatmaps serve as the labels for spatial attention consistency. They are used in the papers of C2SLR and SRM. The general process is (1) first run gen_heatmaps.py to get finer_coords, and (2) in each training iteration, the dataloader will automatically generate Gaussian heatmaps centered at those finer coordinates.
There are two pretrained models: (1) VGG11 pretrained on ImageNet and (2) HRNet pretrained on MPII. Here is the link.
The model checkpoints and configs are put in the same folder.
python main.py --config=config --mode=train
python main.py --config=config --mode=test
Please cite our works if you find this repo is helpful.
@article{zuo2022local,
title={Local Context-aware Self-attention for Continuous Sign Language Recognition},
author={Zuo, Ronglai and Mak, Brian},
journal={Proc. Interspeech},
pages={4810--4814},
year={2022}
}
@inproceedings{zuo2022c2slr,
title={C2slr: Consistency-enhanced continuous sign language recognition},
author={Zuo, Ronglai and Mak, Brian},
booktitle={CVPR},
pages={5131--5140},
year={2022}
}
@article{zuo2024improving,
title={Improving continuous sign language recognition with consistency constraints and signer removal},
author={Zuo, Ronglai and Mak, Brian},
journal={ACM Transactions on Multimedia Computing, Communications and Applications},
volume={20},
number={6},
pages={1--25},
year={2024},
}