Pynini integration #121

mmcauliffe · 2023-10-04T23:50:00Z

Hello, I'm the maintainer for the Montreal Forced Aligner (MFA) and currently working on a new Japanese model for speech-to-text alignment. My current prototype uses sudachipy to generate morphemes, post-process these to create phonological words (e.g., "しちゃって” -> "しちゃって”), and then do the rest of the forced alignment pipeline as if this generated transcript was ground truth accurate (i.e., generate utterance FSTs for phone sequences from pronunciation dictionary look up).

Given that a morphological parser has its own lattice that the best path is extracted from, it'd be nice use the lattice as the starting point, compose it with an FST that does post processing for phonological words, and compose that with the dictionary. The latest versions of sudachipy don't return lattices or expose any internal methods to Python, so I'm still looking for a permanent solution.

For all of its FSTs, MFA uses pynini, which are Python bindings for OpenFst (like here). I saw that janome has a pure python implementation for FSTs, and I was curious if there's interest in adding or migrating that to a pynini implementation, which should simplify it a lot and allow for MFA to directly use any lattices.

If there is interest, I'm happy to put together an initial PR for it!

mocobeta · 2023-10-29T05:45:09Z

Hi @mmcauliffe,
Sorry for the late reply. I've been too busy in recent days to be involved in this issue.

I'm not very familiar with the speech-to-text domain, but It sounds exciting!
Janome has a "no-dependencies" policy for flexibility and future maintenance. I'm just curious - is it possible to re-implement Pynini in Janome? Or do you think it'd be better to have a fork (a variant that integrates Pynini for the string matching engine) of Janome for MFA?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pynini integration #121

Pynini integration #121

mmcauliffe commented Oct 4, 2023

mocobeta commented Oct 29, 2023

Pynini integration #121

Pynini integration #121

Comments

mmcauliffe commented Oct 4, 2023

mocobeta commented Oct 29, 2023