Use DTW-RA in alignment #770

an-lee · 2024-07-03T05:53:25Z

DTW-RA method seems to be more accurate for the audio with music or other background noises.

Build timeline from the transcribe result
Align with timeline and dtw-ra option

The text was updated successfully, but these errors were encountered:

an-lee · 2024-07-03T05:54:26Z

ref: https://github.com/echogarden-project/echogarden/blob/main/src/api/Alignment.ts#L180

an-lee · 2024-07-04T03:07:30Z

echogarden-project/echogarden#60

czyrichard · 2024-07-04T17:42:44Z

看到您在dtw-ra那里提的建议了，非常感谢您的付出。我不会写代码，所以只能从其他方面尽一下自己的绵薄之力，也请您看看能否提些建议：

tldr版：
1）再完美的除杂音和除音乐，似乎都不能解决enjoy现有最新版的对齐问题
2）除杂音音乐后，截去空白音频片段，增加人声的连贯性，能提升对齐率，但作用有限
3）导入字幕时去除换行符似乎对提升对齐率有一些作用
4）人为在音频中的每句话后添加空白音频，对齐率会不升反降
5）我对提高对齐率的两个想法

以下是具体版：

1）我今天尝试了研究了一下各种去除杂音和音乐的AI模型，然后把保留人声最成功的的一个版本（pure_vocal_untruncated.mp3）和srt文件导入到enjoy当中，结果对齐率还是很低。

2）然后我用Audacity把pure_vocal_untruncated.mp3中的静音片段都截去了，然后进行了“增幅（放大）”，生成了pure_vocal_truncated.mp3，然后和srt文件导入enjoy中，前几行的对齐率基本上是100%，具体到每个词都对齐成功了，但是从第八行开始就不行了，enjoy把第八行的内容对齐到第七行的音频去了（但是明明第七行已经完美对齐了），也就是enjoy把几秒钟的音频既对齐到了第七行，又对齐到了第八行，导致后续的对齐都出错了。

3）我用Subtitle Edit把srt文件里的时间戳和换行符都去掉了，生成了pure_vocal_truncated.txt，把pure_vocal_truncated.mp3和pure_vocal_truncated.txt导入了enjoy，这次enjoy成功对齐到了21行，到了22行的时候发生了和上面一样的问题。

4）我还尝试了另一种方法，用chatgpt和claude写出了一个叫脚本，把音频中有字幕的部分保存，无字幕部分去除，然后在每两个部分之间添加两秒钟的空白音频，想看看空白音频能不能帮助enjoy提高对齐率，但是实际上并无卵用，对齐率不升反降，所以此处不再赘述了。

5）最后我还用whisperX导入了pure_vocal_untruncated.mp3和其srt文件，生成了一个含有每句话和每个词的时间戳的json文件，我把这个文件里的数据和enjoy的SQLite数据库里的数据进行了对比，发现enjoy数据库里就多出了每个音标的时间戳，这个whisperX好像确实无法做到。我在想，对于那些不太在意音标数据而更在意时间轴对齐度的用户来说，要是能提供选项直接导入whisperX的数据就太好啦。

6）其实还有一个可能性：能不能按照srt字幕的时间轴对音频进行切片，每一行的切片音频单独进行对齐，最后再把数据汇总到一起，这样就总不会跑偏啦。

data.zip

imxw · 2024-07-16T16:11:26Z

能不能支持自己上传时间戳的字幕文件，比如我已有的资源是有lrc字幕文件的，结果该软件不支持，使用软件自带的whisper生成的又有对不齐的问题

an-lee · 2024-07-16T22:37:37Z

能不能支持自己上传时间戳的字幕文件，比如我已有的资源是有lrc字幕文件的，结果该软件不支持，使用软件自带的whisper生成的又有对不齐的问题

这个 issue 就是解决这个问题的。再等等。

an-lee added the enhancement New feature or request label Jul 3, 2024

an-lee mentioned this issue Jul 4, 2024

音频解析之后，文本和声音对不上 #547

Closed

an-lee linked a pull request Jul 23, 2024 that will close this issue

Feat: Improve alignment for the audio with background noise #870

Merged

an-lee closed this as completed in #870 Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use DTW-RA in alignment #770

Use DTW-RA in alignment #770

an-lee commented Jul 3, 2024

an-lee commented Jul 3, 2024

an-lee commented Jul 4, 2024

czyrichard commented Jul 4, 2024 •

edited

Loading

imxw commented Jul 16, 2024

an-lee commented Jul 16, 2024

Use DTW-RA in alignment #770

Use DTW-RA in alignment #770

Comments

an-lee commented Jul 3, 2024

an-lee commented Jul 3, 2024

an-lee commented Jul 4, 2024

czyrichard commented Jul 4, 2024 • edited Loading

imxw commented Jul 16, 2024

an-lee commented Jul 16, 2024

czyrichard commented Jul 4, 2024 •

edited

Loading