forked from videomi/shotcut_web
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
112 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
--- | ||
layout: post | ||
title: "New Version 24.10: Whisper to a Scream" | ||
author: Dan Dennedy | ||
category: blog | ||
--- | ||
|
||
Version 24.10.29 is now available for [**DOWNLOAD**]({{ "/download/" | prepend: site.baseurl | prepend: site.url }})! | ||
|
||
### Speech to Text | ||
|
||
Shotcut gets its first AI based on [OpenAI's Whisper](https://openai.com/index/whisper/), | ||
courtesy of the [whisper.cpp](https://github.com/ggerganov/whisper.cpp) project. | ||
This is available through **Subtitles > Speech to Text** menu item or button: <kbd>![Speech to Text icon](https://d2t917e3b1b2xy.cloudfront.net/original/3X/8/2/8280d8783625e9b235728767f702e2e80abe3714.png)</kbd>. | ||
|
||
- Our builds include a basic model that has decent speed and accuracy but not a big size. (You can think of the model as the brain.) | ||
- You can [download](https://huggingface.co/ggerganov/whisper.cpp/tree/main) a bigger and better better brain (model) in `ggml` format and configure it in the **Speech to Text** dialog, but it will be slower. | ||
- The dialog creates two jobs that appear in the **Jobs** panel: one to export audio and another to convert to text. | ||
- The results are added to the **Subtitles** panel as a new top-level Subtitle Track. | ||
- Currently, the only GPU our build supports is Apple Silicon. Otherwise, it is heavily multi-threaded on the CPU. | ||
- Known quirk: subtitle items sometimes start earlier than expected. Timing is provided by the model and tool, and we lack the skills and resources to improve this. | ||
- Expect there to be occasional errors. Like humans and non-ideal conditions, it is not perfect. We will not take action on bug reports about some piece of audio not converting to the expected text. | ||
- OpenAI has made some [warnings](https://huggingface.co/openai/whisper-large#evaluated-use) about the usage of their Whisper models: | ||
> In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent.... We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. | ||
### Transition Improvements | ||
|
||
- **Ripple Delete** a transition restores the entirety of the clips included in the transition. | ||
- **Lift** (non-ripple delete) a transition no longer leaves a gap; the gap is filled with the adjacent clips. | ||
- Moving an adjacent clip away increases the transition duration instead of detaching and leaving a gap. | ||
|
||
### Other Improvements | ||
|
||
- Removed the **Export > Video > Resample** button. Now, there are simply ignorable inline warnings when making certain changes. | ||
- Added **File > Show Project in Folder** to menu. | ||
- Added a `decimals <number>` option to numeric keywords in the **GPS Text** video filter. | ||
- Changed **Recent Projects** to **Projects**: items in this view no longer disappear as **Recent** reaches its maximum length and old items are removed. | ||
- Added a **Remove** action to the context menu in **Projects**. | ||
- Hide the **Reframe** video filter and button if **GPU Effects** is on. | ||
- Upgraded FFmpeg to version 7.1. | ||
|
||
### Fixes | ||
|
||
- Fixed a crash doing when doing more than one **Playlist > menu > Add Selected to Slideshow**. In theory, this could fix other random crashes in **Timeline**. | ||
- Fixed a crash opening a project containing a subtitle track with no items. | ||
- Fixed odd value for computed width in **Reframe** output video filter causes export to fail. | ||
- Fixed **Reframe** visual control can create odd-valued dimensions. | ||
- Fixed AVCHD video frame rate is double (could fix other formats). | ||
- Fixed making a proxy video for a iPhone 16 Pro video containing spatial audio. | ||
- Fixed GPU filters paste below non-GPU filters. | ||
- Fixed **Slideshow Generator** dialog is too tall with vertical video mode. | ||
- Fixed **GPS Offset** would reset in **GPS Text** video filter. | ||
- Fixed the maximum allowed **Time** in the **Time Remap** filter to prevent white frames. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
{ | ||
"version_number": 240913, | ||
"version_string": "24.09.13", | ||
"url": "https://shotcut.org/blog/new-release-240913/" | ||
"version_number": 241029, | ||
"version_string": "24.10.29", | ||
"url": "https://shotcut.org/blog/new-release-241029/" | ||
} |