Skip to content

2000ZRL/LCSA_C2SLR_SRM

Repository files navigation

LCSA_C2SLR_SRM

This repo contains the official implementations of the following papers on (signer-independent) continuous sign language recognition (CSLR).

  • [Interspeech 2022] Local Context-aware Self-attention for Continuous Sign Language Recognition. [Paper]

  • [CVPR 2022] C2SLR: Consistency-enhanced Continuous Sign Language Recognition. [Paper]

  • [TOMM 2024] Improving Continuous Sign Language Recognition with Consistency Constraints and Signer Removal. [Paper]

Local Context-Aware Self-Attention (LCSA)

Introduction

An improved Transformer for temporal modeling in CSLR models. We propose to enhance self-attention at two levels: query computation and attention score modulation. For the latter, we propose a novel dynamic Gaussian bias, whose window size can be adjusted automatically.

Performance

Dataset WER (Dev/Test) Ckpt&Cfg
Phoenix-2014 21.4/21.9 link
CSL --/1.4 link

Consistency-Enhanced CSLR (C2SLR)

Introduction

Two consistency constraints to boost CSLR model performance. We first leverage pre-extracted keypoints heatmaps to guide an inner attention module in the visual module. Then we align visual and temporal features at the sentence level as a regularization. Both two constraints can improve CSLR model performance with negligible costs.

Performance

Dataset WER (Dev/Test) Ckpt&Cfg
Phoenix-2014 20.5/20.4 link
Phoenix-2014T 20.2/20.4 link
Phoenix-2014-SI 34.3/34.4 link
CSL --/0.9 link
CSL-Daily 31.9/31.0 link

Signer Removal Module for Signer-Indepedent CSLR (SRM)

Introduction

Existing CSLR works mostly focus on the signer-dependent setting, in which testing signers are all seen during training. However, in the real world, it is infeasible to build a dataset encompassing all signers. In this paper, we propose a signer removal module based on the idea of feature disentanglement. The module is pluggable and can make CSLR models more robust to signer variations.

Performance on Signer-Independent Datasets

Dataset WER (Dev/Test) Ckpt&Cfg
Phoenix-2014-SI 33.1/32.7 link
CSL --/0.68 link

Usage

Prerequisites

Create an environment and install dependencies.

pip install -r requirements.txt

Datasets

Download datasets from their websites and place them under the corresponding directories in data/

Then unzip all these data and put them into ../../data/

Heatmaps

Heatmaps serve as the labels for spatial attention consistency. They are used in the papers of C2SLR and SRM. The general process is (1) first run gen_heatmaps.py to get finer_coords, and (2) in each training iteration, the dataloader will automatically generate Gaussian heatmaps centered at those finer coordinates.

Pretrained Models

There are two pretrained models: (1) VGG11 pretrained on ImageNet and (2) HRNet pretrained on MPII. Here is the link.

Training and Testing

The model checkpoints and configs are put in the same folder.

python main.py --config=config --mode=train
python main.py --config=config --mode=test

Citation

Please cite our works if you find this repo is helpful.

@article{zuo2022local,
  title={Local Context-aware Self-attention for Continuous Sign Language Recognition},
  author={Zuo, Ronglai and Mak, Brian},
  journal={Proc. Interspeech},
  pages={4810--4814},
  year={2022}
}
@inproceedings{zuo2022c2slr,
  title={C2slr: Consistency-enhanced continuous sign language recognition},
  author={Zuo, Ronglai and Mak, Brian},
  booktitle={CVPR},
  pages={5131--5140},
  year={2022}
}
@article{zuo2024improving,
  title={Improving continuous sign language recognition with consistency constraints and signer removal},
  author={Zuo, Ronglai and Mak, Brian},
  journal={ACM Transactions on Multimedia Computing, Communications and Applications},
  volume={20},
  number={6},
  pages={1--25},
  year={2024},
}

About

Code for our works: LCSA, C2SLR, and SRM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published