pyopenjtalk-plusに切り替えるかどうか判断するための調査を行う #1486

Hiroshiba · 2024-11-19T17:35:03Z

内容

どなたかpyopenjtalk-plusという、pyopenjtalkに色々な変更を加えたライブラリがあります。
リプレイスを検討したいのですが、何がどれくらい違うかわからないのでチェックして、自信を持って変更したいです。

なので調査してくださる方を募集します！

Pros 良くなる点

VOICEVOXのデフォルトのアクセントがより正しくなるかも

実現方法

手順はこんな感じかなと：

VOICEVOX ENGINEでpyopenjtalkをアンインストールし、pyopenjtalk-plusをインストール
その状態でVOICEVOX ENGINEを起動し、API経由で大量のテキストのAudioQuery.jsonと音声を作る（キャラは「波音リツ」がおすすめ）
同じテキストを使って元のEINGINEからもjsonと音声を作る
jsonが異なるもの（＝pyopenjtalkの違いによるもの）に対応する音声を聴き比べる

テキストはたぶんweb小節とかが未知語がほどよく現れて良いと思います。
興味ある方いたら気軽にコメントください 🙏

その他

結果はGithubやGoogle Driveか、zipとかにしてアップロードして共有いただければ。
アクセントは結構変わる気がしています。

The text was updated successfully, but these errors were encountered:

Patchethium · 2024-11-19T18:22:22Z

I ran a rough benchmark on ROHAN 4600, I use bleu score for evaluation metric and here's the result:

OJT-Plus:

0.9545325962446665
0.9335017222871858
0.9143195170592943
0.8943820654874246

OJT:

0.9526140253671013
0.9307223445463859
0.9106932148048786
0.8898589642236239

Delta (OJT-Plus-OJT):

0.0019185708775651955
0.0027793777407998377
0.0036263022544156254
0.004523101263800733

Each score is calculated with

from nltk.translate.bleu_score import corpus_bleu
from nltk.translate.bleu_score import SmoothingFunction

smooth = SmoothingFunction()

corpus_score_1 = corpus_bleu(ref, hyp, weights=(1, 0, 0, 0), smoothing_function=smooth.method1)
corpus_score_2 = corpus_bleu(ref, hyp, weights=(0.5, 0.5, 0, 0), smoothing_function=smooth.method1)
corpus_score_3 = corpus_bleu(ref, hyp, weights=(0.33, 0.33, 0.33, 0), smoothing_function=smooth.method1)
corpus_score_4 = corpus_bleu(ref, hyp, smoothing_function=smooth.method1)

Full script

# %%
import csv
import re

file = open('Rohan4600_transcript_utf8.txt')

csvreader = csv.reader(file)
i = 0
label = []
data = []
for row in csvreader:
    text = row[0]
    text = text.split(":")[1]
    result = re.sub(r"\(.*?\)", "", text)
    data.append(result)
    label.append(row[1])

file.close()

# %%
import pyopenjtalk
from tqdm.autonotebook import tqdm

ref = []
hyp = []
for i, d in enumerate(tqdm(data)):
    hyp.append(list(pyopenjtalk.g2p(d, kana=True)))
    ref.append([list(label[i])])

# %%
from nltk.translate.bleu_score import SmoothingFunction, corpus_bleu

smooth = SmoothingFunction()

corpus_score_1 = corpus_bleu(ref, hyp, weights=(1, 0, 0, 0), smoothing_function=smooth.method1)
corpus_score_2 = corpus_bleu(ref, hyp, weights=(0.5, 0.5, 0, 0), smoothing_function=smooth.method1)
corpus_score_3 = corpus_bleu(ref, hyp, weights=(0.33, 0.33, 0.33, 0), smoothing_function=smooth.method1)
corpus_score_4 = corpus_bleu(ref, hyp, smoothing_function=smooth.method1)

# %%
print(corpus_score_1)
print(corpus_score_2)
print(corpus_score_3)
print(corpus_score_4)

Inference speed

It's worth noting that while the bleu score is slightly better, ojt-plus gives a 72.06it/s on inference speed, while ojt gives 12895.64it/s.

Hiroshiba · 2024-11-19T19:09:46Z

@Patchethium

素晴らしい検証ですね！
ROHAN4600のテキストをpyopenjtalkに通し、その読み音素を正解と比較してBLEUスコアを1-gramから4-gramまで比較した、ということですよね。

音素レベルだと性能向上は0.1%くらいで、速度はかなり落ちると。
まあそれでもタスクによっては十分な速度に思いました！

どういう文章で違いがあるのかちょっと気になりました。
おそらく「方」（ほう・かた）の違いや「何」（なん・なに）とかかな･･･？

あと音素以外に、アクセント区切りやアクセントにどれくらい違いがあるのか気になりますね･･･！！

Patchethium · 2024-11-19T20:00:58Z

確かに中身を見ると

ジンセイヤマアリタニアリダガ、キャビアヲツマミブルゴーニュワインヲノメルノワ、コウフクダロウ。 => ジンセーヤマアリタニアリダガ、キャビアヲツマミブルゴーニュワインヲノメルノワ、コーフクダロー。
Delete "イ" from position 3
Add "ー" to position 4
Delete "ウ" from position 41
Add "ー" to position 42
Delete "ウ" from position 47
Add "ー" to position 48

みたいなセイとセーの凡ミスがたくさんあります。どうやらrohanの表記がたくさんカタカナを用いてる一方、ojtが長音のーに変換しがちです。

長音の変換が割と簡単に解決できなさそうので、時間を取ってまだ調べようと思います。

Hiroshiba · 2024-11-20T19:39:17Z

長音で発音するかどうか、なるほどです！！
極端な読み方をしなければセイでもセーでもどちらでも良い、という感覚はあるかもです。言葉の意味は変わらないので。

引き続き調査募集中です！！
アクセントとか読みの違いとかも知りたいところ･･･！

Hiroshiba added 機能向上初心者歓迎タスク初心者にも優しい簡単めなタスク状態：実装者募集実装者を募集している状態 labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyopenjtalk-plusに切り替えるかどうか判断するための調査を行う #1486

pyopenjtalk-plusに切り替えるかどうか判断するための調査を行う #1486

Hiroshiba commented Nov 19, 2024

Patchethium commented Nov 19, 2024 •

edited

Loading

Hiroshiba commented Nov 19, 2024

Patchethium commented Nov 19, 2024

Hiroshiba commented Nov 20, 2024 •

edited

Loading

pyopenjtalk-plusに切り替えるかどうか判断するための調査を行う #1486

pyopenjtalk-plusに切り替えるかどうか判断するための調査を行う #1486

Comments

Hiroshiba commented Nov 19, 2024

内容

Pros 良くなる点

実現方法

その他

Patchethium commented Nov 19, 2024 • edited Loading

Inference speed

Hiroshiba commented Nov 19, 2024

Patchethium commented Nov 19, 2024

Hiroshiba commented Nov 20, 2024 • edited Loading

Patchethium commented Nov 19, 2024 •

edited

Loading

Hiroshiba commented Nov 20, 2024 •

edited

Loading