Owez / yark

YouTube archiving made simple.

Home Page:https://pypi.org/project/yark/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use a format fallback or allow to manually set the video format

alexislours opened this issue · comments

The current format used to download videos is notoriously error prone (see yt-dlp/yt-dlp#3372).

One solution would be for yark to fallback to webm in such cases, to ask the user to manually pick a format from the yt-dlp -F output or to have an option to pass the following string as a CLI argument https://github.com/Owez/yark/blob/c48e37ae405052bb443b04098d088ccc8e071b4e/yark/channel.py#LL209C10-L209C10
The later would also allow to save channels at higher resolution since YouTube only serves mp4 with audio up to 720p resolution.

The main drawback is that the video file will generally be larger in such cases.

Example of a video affected by it: https://www.youtube.com/watch?v=YbYpbXMUsYM

yt-dlp error:

yt-dlp -f "best/[ext=mp4]/hasvid" "https://www.youtube.com/watch?v=YbYpbXMUsYM" -o YbYpbXMUsYM.mp4
[youtube] Extracting URL: https://www.youtube.com/watch?v=YbYpbXMUsYM
[youtube] YbYpbXMUsYM: Downloading webpage
[youtube] YbYpbXMUsYM: Downloading android player API JSON
[info] YbYpbXMUsYM: Downloading 1 format(s): 22
[download] Resuming download at byte 1713408


ERROR: Did not get any data blocks

yark error:

yark refresh munecat
Loading munecat channel..
Downloading metadata..
Parsing video metadata..
Parsing livestream metadata..
Parsing shorts metadata..
Cleaning out previous temporary files..
Downloading 19 new videos..
  • Downloading YbYpbXMUsYM, at 0.2%..
  • Unknown error whilst downloading videos, details below:
[download] Got error: Downloaded 1713408 bytes, expected 780994664 bytes, retrying in a few seconds..
  • Fault with YouTube's servers, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks
  • Sorry, failed to download {name}
Please file a bug report if you think this is a problem with Yark!

Good to know, the custom format argument is a good idea. I'd like to prioritise the best format or a high-quality one by default (if possible) and if not, any format that works. The size of the video file is an alright drawback as long as users have that argument option to use lower-quality videos if they need to.

Prioritizing the best format would just be a matter of not setting the format when starting the download with yt-dlp. But given the project uses yt-dlp without FFMPEG, it will grab the best format that has audio and video as a single file since it can't merge them without FFMPEG.

For example, out of the available formats for the video I linked:

ID  EXT   RESOLUTION FPS CH │   FILESIZE   TBR PROTO │ VCODEC          VBR ACODEC      ABR ASR MORE INFO
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
sb2 mhtml 48x27        0    │                  mhtml │ images                                  storyboard
sb1 mhtml 80x45        0    │                  mhtml │ images                                  storyboard
sb0 mhtml 160x90       0    │                  mhtml │ images                                  storyboard
599 m4a   audio only      2 │   14.65MiB   31k https │ audio only          mp4a.40.5   31k 22k ultralow, m4a_dash
600 weba  audio only      2 │   16.93MiB   36k https │ audio only          opus        36k 48k ultralow, weba_dash
139 m4a   audio only      2 │   23.22MiB   49k https │ audio only          mp4a.40.5   49k 22k low, m4a_dash
249 weba  audio only      2 │   24.80MiB   52k https │ audio only          opus        52k 48k low, weba_dash
250 weba  audio only      2 │   32.58MiB   68k https │ audio only          opus        68k 48k low, weba_dash
140 m4a   audio only      2 │   61.62MiB  129k https │ audio only          mp4a.40.2  129k 44k medium, m4a_dash
251 weba  audio only      2 │   61.33MiB  129k https │ audio only          opus       129k 48k medium, weba_dash
17  3gp   176x144      6  1 │   36.14MiB   76k https │ mp4v.20.3       76k mp4a.40.2    0k 22k 144p
597 mp4   256x144     13    │    8.21MiB   17k https │ avc1.4d400b     17k video only          144p, mp4_dash
598 webm  256x144     13    │    8.85MiB   19k https │ vp9             19k video only          144p, webm_dash
394 mp4   256x144     25    │   29.50MiB   62k https │ av01.0.00M.08   62k video only          144p, mp4_dash
160 mp4   256x144     25    │   19.16MiB   40k https │ avc1.4d400c     40k video only          144p, mp4_dash
278 webm  256x144     25    │   31.33MiB   66k https │ vp9             66k video only          144p, webm_dash
395 mp4   426x240     25    │   37.51MiB   79k https │ av01.0.00M.08   79k video only          240p, mp4_dash
133 mp4   426x240     25    │   40.47MiB   85k https │ avc1.4d4015     85k video only          240p, mp4_dash
242 webm  426x240     25    │   40.88MiB   86k https │ vp9             86k video only          240p, webm_dash
396 mp4   640x360     25    │   72.51MiB  152k https │ av01.0.01M.08  152k video only          360p, mp4_dash
134 mp4   640x360     25    │   79.34MiB  167k https │ avc1.4d401e    167k video only          360p, mp4_dash
18  mp4   640x360     25  2 │  215.72MiB  453k https │ avc1.42001E    453k mp4a.40.2    0k 44k 360p
243 webm  640x360     25    │   91.86MiB  193k https │ vp9            193k video only          360p, webm_dash
397 mp4   854x480     25    │  128.15MiB  269k https │ av01.0.04M.08  269k video only          480p, mp4_dash
135 mp4   854x480     25    │  125.47MiB  264k https │ avc1.4d401e    264k video only          480p, mp4_dash
244 webm  854x480     25    │  145.65MiB  306k https │ vp9            306k video only          480p, webm_dash
22  mp4   1280x720    25  2 │ ~762.64MiB 1565k https │ avc1.64001F   1565k mp4a.40.2    0k 44k 720p
398 mp4   1280x720    25    │  259.04MiB  544k https │ av01.0.05M.08  544k video only          720p, mp4_dash
136 mp4   1280x720    25    │  194.56MiB  409k https │ avc1.4d401f    409k video only          720p, mp4_dash
247 webm  1280x720    25    │  269.18MiB  566k https │ vp9            566k video only          720p, webm_dash
399 mp4   1920x1080   25    │  477.27MiB 1003k https │ av01.0.08M.08 1003k video only          1080p, mp4_dash
137 mp4   1920x1080   25    │  652.13MiB 1370k https │ avc1.640028   1370k video only          1080p, mp4_dash
248 webm  1920x1080   25    │  489.04MiB 1028k https │ vp9           1028k video only          1080p, webm_dash
400 mp4   2560x1440   25    │    1.51GiB 3246k https │ av01.0.12M.08 3246k video only          1440p, mp4_dash
271 webm  2560x1440   25    │    1.42GiB 3051k https │ vp9           3051k video only          1440p, webm_dash
401 mp4   3840x2160   25    │    3.26GiB 7008k https │ av01.0.12M.08 7008k video only          2160p, mp4_dash
313 webm  3840x2160   25    │    4.02GiB 8645k https │ vp9           8645k video only          2160p, webm_dash

In this case, yt-dlp will grab the format id 22 since it's the best with audio and video as a single file, but it is only 720p. If FFMPEG is installed, it would instead grab format 313 for video and format 251 for audio and merge them as a single webm file.

I'm not sure if this is possible to make the Python package aware of a system install of FFMPEG for this to work in yark. I also think this would require some changes in the web server and the checks for a video already downloaded since MP4 format is assumed.

Yeah ffmpeg might be annoying to download. I'll patch now and figure out implementing FFMPEG in 1.3 because videos being limited at a 720p isn't great.

I'm fine with the archiver using any popular format, probably whatever the native html <video> tag supports as a general benchmark.

commented

You could just opportunistically use ffmpeg if it's already installed on the PATH, otherwise keep doing it the current way.