Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtitle Edit 4.0.9 and Purfview's Faster Whisper XXL Still Broken Merging Subtitle Lines #339

Closed
TranslateFuture opened this issue Nov 26, 2024 · 5 comments

Comments

@TranslateFuture
Copy link

Hello, thank you for the newest updates.

With the newest Subtitle Edit 4.0.9 beta update (and every release after the final version of Subtitle Edit 4.0.3, or after December 23, 2023) there's still the broken/merged/random capitalization/no commas or periods/etc. issues that I described here: SubtitleEdit/subtitleedit#8634 and SubtitleEdit/subtitleedit#8209

github translate new korean chinese spanish shuhua workdol broken subtitles comparison 1080p

I've also posted it again on the Subtitle Edit page since the default --standard option (can't be toggled on or off right now, like the settings are not saved) in Subtitle Edit is what's creating the issues: SubtitleEdit/subtitleedit#9035

Once more, thank you so much Purfview, really appreciate all the work and help you've done. I apologize again if I was unclear or rambling, the specific examples (with videos and images) are in the previous comments.

@mjamil85
Copy link

Try to run directly from faster-whisper-xxl.exe using command. Purfview already released a new version with .bat included (drop files in bat) & check if the result is the same as with Subtitle Edit.

@Purfview
Copy link
Owner

Purfview commented Nov 26, 2024

Those SE screenshots doesn't tell me anything. Not interested in those videos or whatever you posted there in SE issues too.

  1. Make sure you use the latest version at the moment: r194.4
  2. Share json file with the problem, you can get it with -f srt json command.
  3. Then write what exact problem you have.
  4. Post full command used

@Purfview
Copy link
Owner

Purfview commented Nov 26, 2024

Open a new issue when you can provide all those things described in the above post.

@TranslateFuture
Copy link
Author

TranslateFuture commented Nov 27, 2024

Thank you as always for the quick reply, Purfview, I have here one new example, this is with the standalone Faster-Whisper-XXL r194.2 (before today's update with r194.4), not using Subtitle Edit at all. The .json file is included now.

The commands were the default (I only changed it to the large-v2 and to Translate to English), or this from the .bat file: faster-whisper-xxl.exe %file_list% -pp -o source --batch_recursive --check_files --standard -f json srt -m large-v2 --task translate

This is a 1-hour video from the DdeunDdeun channel on Youtube, I've also included the official English and Korean subtitles as comparison. Pinggyego/Just An Excuse Episode 58: https://www.youtube.com/watch?v=B4bOGORy58E

github purfview faster whisper xxl broken merged capitalizations translations for korean chinese spanish etc.zip

The problem (with Korean, Spanish, probably a lot of the other languages as well, especially when the video or audio is over 1 hour long) seems to be that the lines are merged because of the --standard command or something along those lines, and so the final output with the .srt file in English has the random capitalized letters, missing periods and commas, etc.

I apologize again if it's not reproducible, I'm using the latest version (not the r194.4 that was uploaded a few hours ago) of Faster-Whisper-XXL, and even with regular Faster-Whisper back then, it definitely looks like the situation is due to the introduction of the --standard command being default or something like that.

I can't really describe it further in detail as I'm still pretty new to all of this, but it seems to be related to the updates around the time period between Subtitle Edit 4.0.3 and Subtitle Edit 4.0.4.

The previous default (for Subtitle Edit) didn't have as many breaks or "<br />" per each dialogue/line, but since from that time to today, it does a lot of the breaking of the lines and it looks it's doing the breaks for like every other word or so.

Thank you again for the fast response. The previous examples from several months ago also exhibit the same behaviors (they were done through Subtitle Edit instead of standalone Faster-Whisper(-XXL) though).

github translate broken korean drama

github subtitle edit broken translation for korean chinese etc.zip

github translate new korean chinese spanish shuhua workdol broken subtitles comparison 1080p

github subtitle edit new broken translations for korean chinese spanish etc.zip

@Purfview
Copy link
Owner

Purfview commented Nov 27, 2024

...seems to be that the lines are merged because of the --standard command or something...

"--standard" preset activates "--sentence" and sentence breaks/joins lines, if punctuation is missing then there will be wrong joins.

...in English has the random capitalized letters, missing periods and commas, etc...

Subtitle writers doesn't change capitalization nor adds nor removes anything in transcription, it can only break and join lines according to the settings.

In r194.5 I enabled custom "auto"/"default" prompt presets for translation task, so it should be better for you now.
Btw, when task is to transcribe then language should be set to activate custom prompt presets.

Read the help to know what some setting do.
Let me know if there is some wrong line breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants