Handle youtube CAPTCHAs (402 errors) gracefully

Question

Handle youtube CAPTCHAs (402 errors) gracefully

phihag opened this issue 13 years ago · comments

Philipp Hagemeister commented 13 years ago

Upon encountering a 402 error, youtube-dl should:

By default, throw out an error ("Due to high usage, youtube requires a CAPTCHA to be solved. Specify the --captcha-solver or --captcha-interactive option to solve the CAPTCHA.")
If the --captcha-solver has been specified, execute the specified program. Write the CAPTCHA image to its stdin and read its stdout. Enter the trimmed stdout as the CAPTCHA and continue downloading.
If the --captcha-interactive option has be been specified, download the file to a temporary location and call ['xdg-open', 'tmpfile']. If this fails, abort with an error message. Otherwise, call raw_input() and ask for a solution for the CAPTCHA.

Implementation notes:

This is requires an option captcha-solver. For the moment, I'm not sure whether it should be a Python callback(Fits better in a library) or a string(Allows serialization of options).
It will be hard to test. I can have a process downloading 24/7 from a high-bandwidth server until it encounters this problem, but there should be a nicer way to get youtube to present CAPTCHAs
Afterwards, update the FAQ / documentation / README.md

Philipp Hagemeister · Answer 1 · Tue Oct 11 2011 22:21:51 GMT+0800 (China Standard Time)

It would be great if someone could provide a pcap dump that contains

youtube-dl encountering the error
A browser (like Firefox) encountering the error
Solving the CAPTCHA in Firefox

Kristaps Karlsons · Answer 2 · Sun Mar 04 2012 05:48:42 GMT+0800 (China Standard Time)

Hi phihag, here is the dump - http://www.grab.lv/_external/capture-libcapng

Steps made:

youtube-dl -g http://www.youtube.com/watch?v=VKRB9oGjsfQ (got 402)
lynx http://www.youtube.com/watch?v=VKRB9oGjsfQ
copied captcha iframe location (www.google.com/recaptcha/api/noscript?k=_hash_)
opened in chrome, filled and submitted
copied returned hash in to textbox, that was visible in lynx

So basically you'd have to forcibly save cookie jar and save passed cookies (use_hitbox, goojf, I'm not sure about VISITOR_INFO1_LIVE) before and after authentication (only captcha filling should be done in a real browser), parse returned html page (goes via 302 redirect) and get iframe location. This URL should be passed to console, so user can click or copy-paste in browser. User fills the captcha in his browser, copies returned hash and pastes in youtube-dl. And again, same as step one - save cookies and afterwards - retry downloading.

Maybe you should take a look at Mechanize - http://wwwsearch.sourceforge.net/mechanize/ - unfortunately I'm not familiar with Python, but WWW::Mechanize works really well for Perl.

Strolls · Answer 3 · Wed May 16 2012 18:20:35 GMT+0800 (China Standard Time)

The most important thing IMO is to give a different exit code if this 402 error is returned. I'm pretty sure this is what's requested in closed issue 144.

For the benefit of Google, this is presently the best (??) way to catch 402 errors:

error=$(youtube-dl -q "$url" 2>&1)
if [ "$?" -eq 0 ] ; then
  echo "video downloaded ok!"
else
  case "$error" in
    (*404*) echo "Error - video deleted" ;;
    (*402*) echo "Error - retrying too fast" ;;
  esac
fi

Obviously, you can put some other stuff - to handle what to actually do in the case of a successful download or error - next to the appropriate echo statement. This will probably involve sleeping before retrying the download.

For the record, it doesn't require a fast box connected 24/7 to produce this error - I can reproduce using youtube-dl --write-info-json --skip-download -q "$url" (i.e. only getting the metadata about the videos) with a slow box on a home connection and only a hundred or two videos. This does appear to vary - some days this week YouTube appears to have let me get away with more than on others.

vanilla38 · Answer 4 · Sun Jul 01 2012 20:11:59 GMT+0800 (China Standard Time)

Hey i'm actually working on a project mainly depending on youtube-dl and I'm sure I'll encounter this error, does someone found a way to bypass or solve this problem ?
I got an idea but i'm not sure if it's possible:

send shell cmd by php an launch youtube-dl
if error 402 send captcha image and input to php
write captcha in the website and send it to linux
4)continue to DL

Strolls · Answer 5 · Mon Jul 02 2012 03:57:23 GMT+0800 (China Standard Time)

I don't really think this is the place to expand much on my comments above.

Whilst I hoped to help people who are encountering this problem, this is a bug report for youtube-dl, not a homework assistance forum.

Once the user fills in a captcha, the IP address is once again allowed to download.

However, filling in a captcha sets a cookie in the browser - this session is given freer access to YouTube than other sessions from the same IP.

Even with another session allowed access, it is easy for youtube-dl to once again exceed permitted downloads (and once again hit the 402 error).

youtube-dl does not have any way to tell you what captcha it wants you to solve, thus also no way for you to return the result of a captcha.

If you can have your PHP program flush the cookies from a browser and open a new browser on the same machine, then that will work.

The easiest way I have found to solve this is to build a dynamic delay - when you get a 402 error, just wait for some time before trying again. When you get multiple successful downloads, decrease the delay. This means the downloads are slower, but no human interaction is required. If you fill in a captcha, then you'll need to fill in another one within a couple of downloads; if you just wait some time (or take it easy for some time) then you can eventually resume again at a reasonable speed.

I am not going to tell you any more of what I know of the parameters of YouTube's rate limiting. As soon as they become publicly disseminated, YouTube will tighten their restrictions further.

I repeat: THIS IS A BUG REPORT, NOT A HELP FORUM.
If you need further assistance, post a question in an appropriate forum and message or email me, so that I can answer it there in public.

twqqis · Answer 6 · Tue Oct 09 2012 14:17:54 GMT+0800 (China Standard Time)

I was going to suggest the dynamic delay, but then noticed Strolls got it down quite nicely. Think it'll be a simple solution to try and test; at least taking the human interaction out of the equation - say when leaving the computer unattended for some time...

Daniel Brooks · Answer 7 · Sat Oct 20 2012 14:49:20 GMT+0800 (China Standard Time)

Would it be possible to implement something like this: http://ptigas.com/blog/2011/02/18/simple-captcha-solver-in-python/

Or this: http://www.wausita.com/captcha/

They are both written in python and may be able to be included easily?

Deleted user · Answer 8 · Tue Feb 19 2013 22:41:39 GMT+0800 (China Standard Time)

What about adding OAuth 2 support so we can authorize our apps using this library against YouTube? This would solve the issue as far as I can tell.

Brams · Answer 9 · Fri Feb 22 2013 17:20:08 GMT+0800 (China Standard Time)

@jasonrwalters in what would consist your solution? Can you explain please?

Deleted user · Answer 10 · Sun Mar 17 2013 05:15:43 GMT+0800 (China Standard Time)

Actually, are we able to place our developer API key in our calls? If that's not a feature it should be. But if the captcha occurs because you're improperly flagged then you need to contact Google, and just write something to handle the event yourself.

See Google docs:
https://developers.google.com/youtube/faq#quota

Ivan Kozik · Answer 11 · Wed Sep 04 2013 05:45:11 GMT+0800 (China Standard Time)

I had CAPTCHA issues frequently when re-grabbing entire channels. I implemented some hacks to (1) avoid hitting YouTube for videos I've already downloaded, and (2) wait 1 sec between HTTP requests: https://github.com/ludios/youtube-dl/commits/prime - and I haven't had problems since.

Manuel Ignacio Rodriguez · Answer 12 · Wed Jun 03 2015 02:02:06 GMT+0800 (China Standard Time)

In the docs there's an option for that now:

--sleep-interval SECONDS

cheers everyone

aki263 · Answer 13 · Thu Aug 17 2017 06:36:05 GMT+0800 (China Standard Time)

I would suggest adding auto captcha solver from https://de-captcher.com/ or some other website because if you are using this to download vids for personal use than you won't encounter captcha but if you are doing like making a website that downloads youtube vids, than you need a service. So a auto captcha solver service is best option.

J0WI · Answer 14 · Thu Jan 09 2020 01:52:07 GMT+0800 (China Standard Time)

The featured pages of channels are currently not affected by captchas. Have you thought of a way to extract the video url from there and bypassing CAPTCHAs?

schnusch · Answer 15 · Tue Jan 26 2021 11:32:15 GMT+0800 (China Standard Time)

I'm not sure if this is the right issue for a discussion about CAPTCHA handling in general. This would allow youtube-dl to also handle hosts that require solving a CAPTCHA before a video link can be obtained.

I recently tried to to handle reCAPTCHAs by displaying them to the user embedded in a custom website to extract the response. Because reCAPTCHA verifies the host of the page it is embedded in, and I assume hCaptcha does so too, the location needs to be faked. This resulted in a simple tool utilizing WebKitGTK (see https://github.com/schnusch/decaptcha) which uses WebKit's webkit_web_view_load_html to fake the location. I just now discovered webRequest.filterResponseData(), which could be used in a web extension, but seems to be only available in Firefox. This would allow to display a CAPTCHA in a custom page too.

Is there any interest of handling CAPTCHAs, perhaps in an external tool, that is invoked by youtube-dl when available on the system?

ad90xa0-aa · Answer 16 · Mon Feb 08 2021 20:01:26 GMT+0800 (China Standard Time)

Just use this service https://anti-captcha.com/

Then let people submit their own login to it if they want to bypass captchas