Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] KaldiFatalError for inputs of particular sizes #24

Open
scottbreyfogle opened this issue Nov 21, 2024 · 0 comments
Open

[BUG] KaldiFatalError for inputs of particular sizes #24

scottbreyfogle opened this issue Nov 21, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@scottbreyfogle
Copy link

Debugging checklist

[ ] Have you updated to latest MFA version?
I may be two minor versions behind, but I'm 90% sure this bug should be in the current version.
[ ] Have you tried rerunning the command with the --clean flag?
I'm using MFA library functions directly

Describe the issue
Technically, this is a bug with Kaldi. I plan to file a bug with them as well, but I think there is an easy mitigation for Kalpy. When an input audio file is of particular size[1], MFCC and Pitch values produced are of different lengths. When Kalpy attempts to use paste_feats to put the two together, the tolerance of 0 causes an empty array to be generated. When those features are then used, it causes a KaldiFatalError because the input is empty.

  1. The size appears to be any size where (length - 400) % 160 > 156. This is related to the frame size (400) and stride (160) of the feature calculation at 16khz.

Mitigation
I think that setting the second argument of this command to 1 to account for a mismatch in frame length of 1 would avoid the error. I'm sorry that I don't currently have the time to fully test and validate this fix.

For Reproducing your issue
Here's a minimal repro of the issue:

import tempfile

import kalpy.data
import kalpy.utterance
import numpy
import soundfile
import montreal_forced_aligner.models


acoustic_model = montreal_forced_aligner.models.AcousticModel('local_pl_model.zip')
with tempfile.NamedTemporaryFile(suffix='.wav') as audio_file:
    soundfile.write(audio_file, numpy.array([0.0] * 1038), samplerate=16_000)
    duration = soundfile.info(audio_file.name).duration
    segment = kalpy.data.Segment(audio_file.name, begin=0.0, end=duration)
    utterance = kalpy.utterance.Utterance(segment, '')
    feats = utterance.generate_features(
      acoustic_model.mfcc_computer,
      acoustic_model.pitch_computer,
      lda_mat=acoustic_model.lda_mat)

Please fill out the following:

  1. Corpus structure
    • What language is the corpus in?
      This happens for Polish, but not English. I have not triangulated exactly what options trigger this code path.
    • How many files/speakers?
      It's reproducible with a single file.
    • Are you using lab files or TextGrid files for input?
      Wave and text files
  2. Dictionary
  3. Acoustic model

Log file
Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

  • OS: [e.g. Windows, OSX, Linux] Linux
  • Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] Ubuntu 22.04.1
  • Any other details about the setup (Cloud, Docker, etc)

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants