Skip to content

Latest commit

 

History

History
87 lines (70 loc) · 5.43 KB

README.md

File metadata and controls

87 lines (70 loc) · 5.43 KB

Performance Record

This is a Chinese speech recognition recipe that trains on all Chinese corpora including:

Dataset Duration (Hours)
Aidatatang 140
Aishell 151
MagicData 712
Primewords 99
ST-CMDS 110
THCHS-30 26
TAL-ASR 587
AISHELL2 1000

Unified Transformer Result

Data info:

  • Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
  • Feature info: using fbank feature, with cmvn, no speed perturb.
  • Training info: lr 0.004, batch size 18, 3 machines, 3*8 = 24 GPUs, acc_grad 1, 220 epochs, dither 0.1
  • Decoding info: ctc_weight 0.5, average_num 30
  • Git hash: 013794572a55c7d0dbea23a66106ccf3e5d3b8d4

WER

Dataset chunk size attention decoder ctc greedy search ctc prefix beam search attention rescoring
Aidatatang full 4.23 5.82 5.82 4.71
16 4.59 6.99 6.99 5.29
Aishell full 4.69 5.80 5.80 4.64
16 4.97 6.75 6.75 5.37
MagicData full 2.86 4.01 4.00 3.07
16 3.10 5.02 5.02 3.68
THCHS-30 full 16.68 15.46 15.46 14.38
16 17.47 16.81 16.82 15.63

Unified Conformer Result

Data info:

  • Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
  • Feature info: using fbank feature, with cmvn, speed perturb.
  • Training info: lr 0.001, batch size 8, 1 machines, 1*8 = 8 GPUs, acc_grad 12, 60 epochs
  • Decoding info: ctc_weight 0.5, average_num 10
  • Git hash: 5bdf436e671ef4c696d1b039f29cc33109e072fa

WER

Dataset chunk size attention decoder ctc greedy search ctc prefix beam search attention rescoring
Aidatatang full 4.12 4.97 4.97 4.22
16 4.45 5.73 5.73 4.75
Aishell full 4.49 5.07 5.05 4.43
16 4.77 5.77 5.77 4.85
MagicData full 2.55 3.07 3.05 2.59
16 2.81 3.88 3.86 3.08
THCHS-30 full 13.55 13.75 13.76 12.72
16 13.78 15.10 15.08 13.90

Unified Conformer Result

Data info:

  • Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, THCHS-30, TAL-ASR, and AISHELL2.
  • Feature info: using fbank feature, dither=0, cmvn, speed perturb
  • Training info: lr 0.001, batch size 22, 4 GPUs, acc_grad 4, 120 epochs, dither 0.1
  • Decoding info: ctc_weight 0.5, average_num 10
  • Git hash: 66f30c197d00c59fdeda3bc8ada801f867b73f78

WER

Dataset chunk size attention decoder ctc greedy search ctc prefix beam search attention rescoring
Aidatatang full 3.22 4.00 4.01 3.35
16 3.50 4.63 4.63 3.79
Aishell full 1.23 2.12 2.13 1.42
16 1.33 2.72 2.72 1.72
MagicData full 2.38 3.07 3.05 2.52
16 2.66 3.80 3.78 2.94
THCHS-30 full 9.93 11.07 11.06 10.16
16 10.28 11.85 11.85 10.81
AISHELL2 full 5.25 5.81 5.79 5.22
16 5.48 6.48 6.50 5.61
TAL-ASR full 9.54 10.35 10.28 9.66
16 10.04 11.43 11.39 10.55