[Help needed] Merging pending changes from youtube-dl

Question

[Help needed] Merging pending changes from youtube-dl

pukkandan opened this issue 3 years ago · comments

pukkandan commented 3 years ago

Upstream commits that have not been merged

[jsinterp] Improve parsing
- The changes appear to be pointless?
[ITV] Overhaul ITV extractor
- Need help!
- related #5570
[feat] Add support to external downloader aria2p
- We partially have a different implementation, but is broken. Need to fix it
- No need to use aria2p, we can achieve same with websockets directly
[fd/dash] range for mpd fragments
- Related: #8711
[ie] range for mpd
- Related: #8711
[ie] Correctly resolve base-url in manifest
- Related: #8269
[utils] base for int_or_none
[fd/external] Rework ffmpeg detection
[test:download] Support playlist_maxcount expected value

Lesmiscore · Answer 1 · Fri Jan 22 2021 09:23:49 GMT+0800 (China Standard Time)

Why not copy as-is from youtube-dl?

pukkandan · Answer 2 · Fri Jan 22 2021 09:39:40 GMT+0800 (China Standard Time)

There were some modifications made to these extractors in youtube-dlc. I don't want to blindly remove those changes. That is why someone who understands inner workings of these extractors needs to resolve the conflicts correctly

For example, let's consider the youtube extractor. This fork supports search URLs, can download multiple feed pages, automatically redirects channel pages to \video, has better age gate bypass etc. If we were to just copy the code from youtube-dl when they make any change, we could lose these.

I understand the youtube extractor enough that I am able to pick the best of both worlds when I do the merger. But for me, the same isn't true for the above mentioned extractors.

nixxo · Answer 3 · Wed Mar 10 2021 00:32:53 GMT+0800 (China Standard Time)

@pukkandan the archive.org extractor is completely different from the youtube-dl or dlc version because it comes from here ytdl-org/youtube-dl#27156

so the others commits are useless. The PR I linked is a almost complete rewrite of the extractor to add support for more media type and playlist etc.

pukkandan · Answer 4 · Wed Mar 10 2021 00:40:22 GMT+0800 (China Standard Time)

the archive.org extractor is completely different from the youtube-dl or dlc version because it comes from here ytdl-org/youtube-dl#27156

Yes, I'm aware. I am the one who merged it after all :D

But since youtube-dl devs didn't merge it and instead made their own changes to the extractor, I assumed that the changes they made must have some merit. But if our version can do everything youtube-dl's version can, then great

nixxo · Answer 5 · Wed Mar 10 2021 01:06:14 GMT+0800 (China Standard Time)

all the tests in the extractor worked fine and I tested a couple of other random pages and they worked as well.

throwawayay · Answer 6 · Wed Mar 10 2021 13:08:44 GMT+0800 (China Standard Time)

There's also this extensive list of new extractors written but not merged into youtube-dl (which may make more sense as a new issue, but this seemed related)

ytdl-org/youtube-dl#28054

nixxo · Answer 7 · Fri Mar 12 2021 00:22:48 GMT+0800 (China Standard Time)

@pukkandan The TMZ extractor is a similar situation of the archive.org extractor.

They are different but the one in here all tests works and tested with the urls added in the test of the other version works as well.

This one has a single class handling everything. The other has two classes managing different type of the website pages. Both version of the extractor works.

pukkandan · Answer 8 · Fri Mar 12 2021 00:59:54 GMT+0800 (China Standard Time)

Considering yt-dlp's tmz extractor is tiny, that is impressive.

@nixxo I really appreciate you taking time to go through and test these extractors :D

nixxo · Answer 9 · Tue Mar 16 2021 17:04:40 GMT+0800 (China Standard Time)

@pukkandan the stitcher extractor merge fail bacause you missed one previous commit that added show support and not only the single episode.

I made a PR with the two commits from youtube-dl #175

"Two Sheds" Jackson · Answer 10 · Mon Mar 22 2021 21:32:17 GMT+0800 (China Standard Time)

Something strange about archive.org: The archived copies of videos' webpages sometimes have usable links to manifests, thumbnails, subtitles, etc. The links need to have the prefix http://web.archive.org/web/[0-9]+ removed from them. Even if a webpage has been deleted or updated, the actual video sometimes can still be accessed on the original server. Note that this is different than videos hosted by archive.org.

nixxo · Answer 11 · Sun Mar 28 2021 19:15:09 GMT+0800 (China Standard Time)

@2ShedsJackson can you provide a url that gives this behaviour?

"Two Sheds" Jackson · Answer 12 · Sun Mar 28 2021 22:30:26 GMT+0800 (China Standard Time)

@2ShedsJackson can you provide a url that gives this behaviour?

The nightly opera streams from https://www.metopera.org/ sometimes still work for a couple of days after they've been replaced.

pukkandan · Answer 13 · Sat Mar 05 2022 04:27:39 GMT+0800 (China Standard Time)

@Ashish0804 Could you have a look at the differences b/w your NDR implementation 23dd2d9 and youtube-dl's ytdl-org/youtube-dl#30531

Related: #2337

dirkf · Answer 14 · Tue Jun 14 2022 22:46:23 GMT+0800 (China Standard Time)

The major difference is in the NDRIE._extract_embed() method, whose signature was also changed. I don't recall but I'm not sure that the yt-dlp target still exists, or I would have included it. The pages that the PR was fixing use the sophoraID scheme and that may now be all of them (eg #2337 (comment)). Any thumbnail is found by _search_json_ld().

NJoyIE._extract_embed() is also somewhat different and again I believe that the yt-dl version is more up-to-date.

In the yt-dlp extractor NDRIE._VALID_URL has a pattern for daserste.de but no test (yt-dl handles that domain in ard.py).

The tests should show which versions work.

Everything else is conventional or fixing tests.

gamer191 · Answer 15 · Fri Jun 17 2022 08:06:54 GMT+0800 (China Standard Time)

yt-dl handles that domain in ard.py

yt-dlp mentions Daserste in the regexes for both ard.py and ndr.py. I don't know how to read regexes, but an online regex tester I used suggests that the ndr.py one is for daserste.ndr.de.

It was added in dc9d8f4, which was part of blackjack4494/youtube-dlc#95 (yt-dlp is a fork of youtube-dlc), a PR which fixed ytdl-org/youtube-dl#26563

dirkf · Answer 16 · Fri Feb 17 2023 21:49:30 GMT+0800 (China Standard Time)

ITV extractor was based on the last yt-dlp PR and should be OK with these considerations:

remove _sort_formats()
tweak any removed compat_xxx
optionally, delete _search_nextjs() shim in favour of yt-dlp native method
possibly investigate if subtitles are available in the m3u8 manifest and use _extract_m3u8_formats_and_subtitles() if so.

If there is a test port or PR I could exercise it in the UK.

pukkandan · Answer 17 · Sun Feb 19 2023 07:50:47 GMT+0800 (China Standard Time)

[jsinterp] Improve parsing

@dirkf Could you explain the point of the regex rework in jsinterp?

We don't have . implemented except for some specific cases. So this does nothing. Even if it were to be implemented, the method names are generally different b/w JS and Python. Some fields like split/search/flags etc will work, but the test seems to imply you are expecting the others as well?

Actually, none of YT players are actually using the regexes. We only needed to parse them out. The .compile can even be completely removed (possible speed improvement?)

The newly added replace also don't work. When passed a string "a".replace(".", "b"), JS interprets . literally, while the code considers it a regex. On the other hand "a".replace(\.\, "b") also doesn't work and raises error, since JS_Regex doesn't inherit from re.Pattern

dirkf · Answer 18 · Mon Mar 06 2023 07:22:31 GMT+0800 (China Standard Time)

That was a bit of WIP, apparently, even though the specification of String.prototype.replace[All]() has all the logic and consistency that one has come to expect from Brendan's busy week.

The linked test was really checking that the SO hack behaved sensibly.

So far there hasn't been any use of replace() but if it happens there's now this: dirkf/youtube-dl@3be072e.

The .compile could indeed be lazy, like this: dirkf/youtube-dl@5f0eea7. It doesn't seem to affect the time for test/test_youtube_signature.py much: maybe 5% quicker. Py2.7/Py3.9 runtime ratio still ~1.6, though.