packt-cli / Packt-Publishing-Free-Learning

Scripts that automatically claim and download free daily eBooks from https://www.packtpub.com/packt/offers/free-learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow for self-solved anti-captcha

eharris opened this issue · comments

With the changes they appear to have recently made that now requires solving a captcha even for logging in (see info in #169), the script doesn't even allow for downloading books you already own without setting up anti-captcha.

I have no interest in setting up or subscribing to an anti-captcha service, but I'm happy to solve the captcha myself manually, I just need a way to do so and provide that to the script so I can download my owned content (packt-cli -da).

If so why won't you just use their website? And what do you expect from the script: just the CLI command allowing you to pass ReCaptcha solution to it and run download all books or also a way to handle the ReCaptcha solving?

The website doesn't offer an easy way to download anything I haven't already downloaded, or do so with consistent naming. This script does. I would have to manually and very time intensively figure out what I'm missing (on an continuing/ongoing basis) and then download anything I'm missing individually, make sure the names are correct. And do all that work over and over again every single time I want to make sure my downloaded library is complete.

Yes, I would expect some way to have the script let me solve the captcha and then continue to do its work with the results. For example, one way that comes to mind would be to invoke a browser window and pass the captcha to it, and take the results of the submit back. The anti-captcha service it's already using obviously has a way to do this, so it seems like there should be some reasonable way to do it locally.

Maybe another way would be to have the user login to the site using a normal browser, and then invoke a little javascript or some other means of providing an easy way to expose and grab the JWT token easily from that login and just pass it on the command line to the script?

I'll think about that, but it may not be very obvious (and almost surely beyond the scope of the CLI script). I'd just go for the second solution for the sake of simplicity of the script, but we'd need how to obtain ReCAPTCHA solution in the first place.

I second that demand.
I know it must be possible, I knew this was done for Pokemon Go mapping at some point in the past.

I searched for this, this was done in RocketMap and seems to be a non-trivial feature (they passed via a bookmarklet which in the end sent back information to the server-script)

The Recaptcha on login is only enabled sometimes.

There are other endpoints that can be used with the access token and refresh token that a successful login grants, to get new valid tokens, following the OAuth2 model. I.e. once you have successfully authenticated, you can reauthenticate using your tokens at least every 30 days without using the username and password.

This would mean the Recaptcha does not affect your ability to download titles.


Please note this is not an endorsement of any activity.

@supachris28 Where have you taken this information from? I'm especially interested how do you know that "you can reauthenticate using your tokens at least every 30 days without using the username and password".

I can see access_token_live and refresh_token_live in my cookies, they expire after 24 hours. How do you know that this refresh token will be valid for 30 days? Is it part of OAuth 2.0?

The cookies should be valid for longer, they are 30 days if you log into the subscription site rather than the store.


Please note this is not an endorsement of any activity.

Can we assume that user can find JWT token in the cookies in the browser after logging in? If so it would be easy to provide additional CLI parameter (JWT token) which would override user authentication and make script use provided JWT token.

If this is the solution used (having the user find and supply the JWT token from a browser login) then the script should probably also undertake the additional functionality @supachris28 noted to automatically "refresh" (extend the expiration of) the token(s) so that the user only has to supply it once, and the script keeps it "active" as long as the script is used on a regular enough basis to keep the token from expiring, and takes care of storing the updated token values.

I can confirm that when I login to the subscription site, the two tokens do have a 30 day expiration. In my session, the access_token_live is 671 chars long, and the refresh_token_live is 82 chars long. Given the length, and allowing for the possibility that more than one cookie/token is needed for this to work, it would seem that storing the credential information in the config file would be better than passing it on the command line.

For the first proof-of-concept pass, I think it would be ok to have the user be responsible for knowing how to view and copy the access token(s), as long as there is documentation as to how to find the correct cookie values (what domain, what cookie names, etc).

Any Plans to implement cookie based log ins?
I'm trying to DL my entire library But I cant Log in due to Anti-Captcha token requirement.

It'd be possible to change the code to be able to pass JWT (not very hard, I'd say its very easy). Then you could log in the browser, extract JWT token from the cookies (do you know how to do it?) and pass it to the script (although we'd be unable to fetch another token after passed one expires).

We (I?) could also see one day how to properly refresh token as specified in JWT specification, than I guess the issue above would no longer be and issue.

Would that be suitable for you? I'm not sure we should merge it before we make it properly, but we could maintain a branch with such functionality.

Oh, I see it's not a new issue, but everything I've written still holds.

Cant we just ask User to export the cookies of the browser and let the Program search for the token and do it automatically like how it's done in aria2c?
I'm comfortable with any option though, But i just want token to be last enough for 300+ Books to be DLed at <1MBps speed.

I see it may be quite easy using browser-cookie3, I'll try to do it in spare time (I cannot promise any particular deadline though, I just promise I'll have that in my mind).