You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #355, I've changed the format setting passed to yt-dlp to select proper video streams.
"historical" value before this change was best[ext={vidext}]/bestvideo[ext={vidext}]+bestaudio[ext={audext}]/best
Where {vidext} and {audext} are computed based on the format chosen at youtube2zim CLI:
if --format mp4 is used, {vidext} is mp4 and {audext} is m4a
if --format webm is used, {vidext} is webm and {audext} is webm
not other --format setting is supported
This format selector is not working because on some cases we do not have a webm format discovered by yt-dlp (we basically always use --format webm), so best[ext={vidext}] and bestvideo[ext={vidext}]+bestaudio[ext={audext}]/ do not match. And the fallback to best does not work either because there is no stream with both audio and video because platforms (and especially Youtube) tends to now proper audio-only and video-only streams, since players are now widely capable to "combine" the two streams on-the-fly.
This format was hence "buggy" in the sense that it failed to download the video while it was in fact quite possible to find a good one.
I changed the format in mentionned PR to bestvideo*[ext={vidext}]+bestaudio[ext={audext}]/bestvideo*+bestaudio/best.
I feel like this setting is still not the most appropriate one because in many cases, Youtube (at least) seems to not propose many webm streams, in favor of mp4 (see #351 (comment)).
Let's take Youtube video 7_N0yozUnWY as an example:
yt_dlp yt-dlp --list-formats https://www.youtube.com/watch\?v\=7_N0yozUnWY
[youtube] Extracting URL: https://www.youtube.com/watch?v=7_N0yozUnWY
[youtube] 7_N0yozUnWY: Downloading webpage
[youtube] 7_N0yozUnWY: Downloading ios player API JSON
[youtube] 7_N0yozUnWY: Downloading web creator player API JSON
[youtube] 7_N0yozUnWY: Downloading m3u8 information
[info] Available formats for 7_N0yozUnWY:
ID EXT RESOLUTION FPS CH │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
sb2 mhtml 48x27 0 │ mhtml │ images storyboard
sb1 mhtml 80x45 0 │ mhtml │ images storyboard
sb0 mhtml 160x90 0 │ mhtml │ images storyboard
233 mp4 audio only │ m3u8 │ audio only unknown [fr] Default
234 mp4 audio only │ m3u8 │ audio only unknown [fr] Default
139 m4a audio only 2 │ 4.63MiB 49k https │ audio only mp4a.40.5 49k 22k [fr] low, m4a_dash
140 m4a audio only 2 │ 12.28MiB 129k https │ audio only mp4a.40.2 129k 44k [fr] medium, m4a_dash
251 webm audio only 2 │ 9.84MiB 104k https │ audio only opus 104k 48k [fr] medium, webm_dash
269 mp4 256x144 25 │ ~ 11.27MiB 119k m3u8 │ avc1.4D400C 119k video only
160 mp4 256x144 25 │ 3.56MiB 38k https │ avc1.4D400C 38k video only 144p, mp4_dash
230 mp4 640x360 25 │ ~ 38.99MiB 411k m3u8 │ avc1.4D401E 411k video only
134 mp4 640x360 25 │ 14.17MiB 150k https │ avc1.4D401E 150k video only 360p, mp4_dash
18 mp4 640x360 25 2 │ ≈ 26.38MiB 278k https │ avc1.42001E mp4a.40.2 44k [fr] 360p
605 mp4 640x360 25 │ ~ 28.83MiB 304k m3u8 │ vp09.00.21.08 304k video only
243 webm 640x360 25 │ 8.90MiB 94k https │ vp9 94k video only 360p, webm_dash
232 mp4 1280x720 25 │ ~109.26MiB 1153k m3u8 │ avc1.64001F 1153k video only
136 mp4 1280x720 25 │ 39.45MiB 416k https │ avc1.64001F 416k video only 720p, mp4_dash
270 mp4 1920x1080 25 │ ~172.02MiB 1815k m3u8 │ avc1.640028 1815k video only
137 mp4 1920x1080 25 │ 69.28MiB 731k https │ avc1.640028 731k video only 1080p, mp4_dash
Only webm streams available are:
stream 243 for video at 640x360, video bitrate 94k
stream 251 for audio, audio bitrate 104k
While we have much better mp4 streams on this video:
stream 137 for video at 1920x1080, video bitrate 731k (I don't think we can select the m3u8 stream 270 which is even better, if I'm not mistaken we only use http protocol, anyway the point is still the same)
stream 140 for audio, audio bitrate 129k
As far as I've understood, original idea was to select best stream possible, and then reencode to our preset, so that we ensure we will have the best chance to not loose quality by reencoding a limited quality video into another limited quality video. While favoring our "preferred" output format might help a bit in the past, I feel like this is now causing more harm than good.
I propose to change this format setting to bestvideo*+bestaudio/best (mostly yt-dlp default according to the documentation)
The text was updated successfully, but these errors were encountered:
I agree ; the previous setting were meant to save reencoding for those not using --low-quality and it was working fine.
Given we always reencode, this path becomes the exception and choosing best then reencoding seems more appropriate.
We were indeed not always reencoding. I find this very confusing, especially since we have the high quality setting in zimscraperlib. I propose to really always reencode, to avoid "strange unexpected behaviors", even if it means a bit more processing on some cases. I will propose a PR with this.
In #355, I've changed the
format
setting passed to yt-dlp to select proper video streams."historical" value before this change was
best[ext={vidext}]/bestvideo[ext={vidext}]+bestaudio[ext={audext}]/best
Where
{vidext}
and{audext}
are computed based on theformat
chosen at youtube2zim CLI:--format mp4
is used,{vidext}
ismp4
and{audext}
ism4a
--format webm
is used,{vidext}
iswebm
and{audext}
iswebm
--format
setting is supportedThis format selector is not working because on some cases we do not have a webm format discovered by yt-dlp (we basically always use
--format webm
), sobest[ext={vidext}]
andbestvideo[ext={vidext}]+bestaudio[ext={audext}]/
do not match. And the fallback tobest
does not work either because there is no stream with both audio and video because platforms (and especially Youtube) tends to now proper audio-only and video-only streams, since players are now widely capable to "combine" the two streams on-the-fly.This format was hence "buggy" in the sense that it failed to download the video while it was in fact quite possible to find a good one.
I changed the format in mentionned PR to
bestvideo*[ext={vidext}]+bestaudio[ext={audext}]/bestvideo*+bestaudio/best
.See #351 for some discussion around all this.
I feel like this setting is still not the most appropriate one because in many cases, Youtube (at least) seems to not propose many
webm
streams, in favor ofmp4
(see #351 (comment)).Let's take Youtube video
7_N0yozUnWY
as an example:Only webm streams available are:
While we have much better mp4 streams on this video:
As far as I've understood, original idea was to select best stream possible, and then reencode to our preset, so that we ensure we will have the best chance to not loose quality by reencoding a limited quality video into another limited quality video. While favoring our "preferred" output format might help a bit in the past, I feel like this is now causing more harm than good.
I propose to change this format setting to
bestvideo*+bestaudio/best
(mostly yt-dlp default according to the documentation)The text was updated successfully, but these errors were encountered: