[BUG] KaldiFatalError for inputs of particular sizes #24

scottbreyfogle · 2024-11-21T21:00:58Z

Debugging checklist

[ ] Have you updated to latest MFA version?
I may be two minor versions behind, but I'm 90% sure this bug should be in the current version.
[ ] Have you tried rerunning the command with the --clean flag?
I'm using MFA library functions directly

Describe the issue
Technically, this is a bug with Kaldi. I plan to file a bug with them as well, but I think there is an easy mitigation for Kalpy. When an input audio file is of particular size[1], MFCC and Pitch values produced are of different lengths. When Kalpy attempts to use paste_feats to put the two together, the tolerance of 0 causes an empty array to be generated. When those features are then used, it causes a KaldiFatalError because the input is empty.

The size appears to be any size where (length - 400) % 160 > 156. This is related to the frame size (400) and stride (160) of the feature calculation at 16khz.

Mitigation
I think that setting the second argument of this command to 1 to account for a mismatch in frame length of 1 would avoid the error. I'm sorry that I don't currently have the time to fully test and validate this fix.

For Reproducing your issue
Here's a minimal repro of the issue:

import tempfile

import kalpy.data
import kalpy.utterance
import numpy
import soundfile
import montreal_forced_aligner.models


acoustic_model = montreal_forced_aligner.models.AcousticModel('local_pl_model.zip')
with tempfile.NamedTemporaryFile(suffix='.wav') as audio_file:
    soundfile.write(audio_file, numpy.array([0.0] * 1038), samplerate=16_000)
    duration = soundfile.info(audio_file.name).duration
    segment = kalpy.data.Segment(audio_file.name, begin=0.0, end=duration)
    utterance = kalpy.utterance.Utterance(segment, '')
    feats = utterance.generate_features(
      acoustic_model.mfcc_computer,
      acoustic_model.pitch_computer,
      lda_mat=acoustic_model.lda_mat)

Please fill out the following:

Corpus structure
- What language is the corpus in?
  This happens for Polish, but not English. I have not triangulated exactly what options trigger this code path.
- How many files/speakers?
  It's reproducible with a single file.
- Are you using lab files or TextGrid files for input?
  Wave and text files
Dictionary
- Are you using a dictionary from MFA? If so, which one?
  Yes. I believe this occurs in acoustic feature computation before the dictionary is relevant, but we're using https://github.com/MontrealCorpusTools/mfa-models/releases/download/dictionary-polish_mfa-v2.0.0a/polish_mfa.dict
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one?
  Yes, we're using https://github.com/MontrealCorpusTools/mfa-models/releases/download/acoustic-polish_mfa-v2.0.0a/polish_mfa.zip

Log file
Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

OS: [e.g. Windows, OSX, Linux] Linux
Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] Ubuntu 22.04.1
Any other details about the setup (Cloud, Docker, etc)

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

scottbreyfogle added the bug Something isn't working label Nov 21, 2024

scottbreyfogle assigned mmcauliffe Nov 21, 2024

scottbreyfogle mentioned this issue Nov 21, 2024

Pitch and MFCC output lengths differ for same input audio kaldi-asr/kaldi#4960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] KaldiFatalError for inputs of particular sizes #24

[BUG] KaldiFatalError for inputs of particular sizes #24

scottbreyfogle commented Nov 21, 2024

[BUG] KaldiFatalError for inputs of particular sizes #24

[BUG] KaldiFatalError for inputs of particular sizes #24

Comments

scottbreyfogle commented Nov 21, 2024