ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites

Home Page:http://ytdl-org.github.io/youtube-dl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--cookies option, get cookies dynamically

Iridium-Lo opened this issue · comments

Checklist

  • I'm reporting a site feature request
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've searched the bugtracker for similar site feature requests including closed ones

Description

When having to use the --cookies option for sites with cloud fare protection:

  • rather than having users install a third party extension (stated in guide) then manually run it each time they need to download

Suggestion

  • get the cookies dynamically, use curl as it gets cookies in netscape format, otherwise you have to convert to netscape which takes some work
  • capture the cookie in a variable or something, don't write to file (more work)

I done this for a script I use, using curl here is a module from it:

IFS=$'\n'

getCookies() {
    curl $site \
      --silent \
      --output /dev/null \
      --user-agent $userAgent \
      --cookie-jar ~/cookies.txt 
}

ytdl() {
    youtube-dl \
      --no-part \
      --no-check-certificate \
      --cookies ~/cookies.txt \
      --user-agent $userAgent \
      --download-archive arc.txt \
      $@
}

downloadSimultaneously() {
    local userAgent site urlArray
    IFS=$'\n'
    userAgent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:123.0) Gecko/20100101 Firefox/123.0'
    site=$1
    urlArray=($@)

    getCookies $site 
    
    parallel -j 0 \
      ytdl ::: ${urlArray[@]}
}

export -f ytdl

For Python something like this:

from requests import *

headers = {
    "User-Agent": #site from download url
}

cookies = get('site from download url', headers=headers)

If you can just replay cookies set by visiting some URL with no user interaction, we can easily do that in the extractor itself.

I'm not sure what you mean by that?

getCookies gets the new/updated cookies each time downloadSimulataneously is executed

When yt-dl receives a Set-Cookie header from the site the cookie is stashed in a Python cookielib/http.cookiejar CookieJar accessible to extractor code as the cookiejar attribute of the extractor. The file specified by --cookies ... is updated with the received cookie, and could be passed to a subsequent yt-dl invocation. So yt-dl already includes the same functionality that curl offers (it doesn't have a --download-archive ... option, does it?).

Sometimes a site that rejects requests or redirects requests to a captcha page will send authorisation cookies to bypass this blockage if a specific site page or API URL is visited. An extractor for the site can do that as part of its _real_initialize() method.

oops, obviously curl doesn't have that option, I'll edit that.

So the option --cookies option doesn't just read the txt file it actually writes to it?

Meaning I could just specify --cookies cookies.txt and it would set the cookies to the file? If so the docs need updated. Or does cookies.txt need to be set once then yt-dl will set it fresh every subsequent run?

The only other hurdle is that even with the cookies set, you have to go to the site and get past the cloudfare protection, or it will still give a 404.

Does yt-dl have a parallel download option like my script?

cookies.txt starts as you wrote it and is updated by yt-dl as new cookies and values are set by the site.

yt-dl doesn't support the syntax/library modules needed for parallel execution but but there is some support for it in yt-dlp.

so you set cookies.txt once then don't need to keep setting it.

you say some support for parallel downloads?

Can I add a PR for my script?

if you read the original comment it's a better way of doing things (less work) than creating playlists, aside from the parallel part.

or add a parallel download option?

Running multiple instances of yt-dl in parallel using the same output directory or download archive (etc) is not really supported. See #350 and the yt-dlp thread linked there.

Alright so essentially parallel downloads just isn't going to integrate well with the existing code?

That thread is a discussion between users mostly I don't see any dev input (might've missed it), anyway for me it's not an issue.

From what you have said it will be easy to carry out this feature request. Could I help?

Caveat

Even when you have the latest cookies you still have to visit the site and get through the cloudfare protection manually, or it will give a 404, so we (if you point me in the right difection) or yt-dl maintainers will need to look into that.

just a note I install yt-dl from the repo not using a package manager, so I can get access to branches with fixes before they are merged to master (it can take a while sometimes)

The caveat is the real problem that needs to be solved. yt-dlp hopes to do it with curlCFFI. An implementation of that solution here would require so much shimming and/or imposition of limiting dependencies that just using yt-dlp instead would be more sensible.

As you may observe, anyone can be a dev, but almost all the knowledgable and active contributors are working on yt-dlp. Features and relevant fixes here generally get pulled downstream, and downstream improvements, especially extractor modules, may also be pulled here. I'm not likely to merge a PR that adds a feature already implemented downstream unless it behaves in the same way (API, CLI) for cases that are covered by both implementations.

Oh right I see... That's why I thought yt-dlp is better, I had things the wrong way around I thought yt-dl is the one with more support.

Although you said yt-dlp has some parallel support. GNU parallel is also some as it only goes to 60 instances max.

Could you let me know the next best thing (the cmd) for generating an archive from urls with yt-dlp (with minimal download) like you did for yt-dl please?

ignore that the archive generate cmd is the same, btw yt-dlp is much faster