[BUG] Imgur - Response code 429

Question

[BUG] Imgur - Response code 429

gemini0x2 opened this issue a year ago · comments

I am reporting a bug.
I am running the latest version of BDfR
I have read the Opening an issue

Description

Imgur links are returning response code 429.
Note: I'm able to browse Imgur normally in my browser and even access the direct links of files that return 429 in bdfr. This error continues even after waiting 24 hours. Never had this issue before.

Command

python bdfr --user reddituser --submitted

Environment

OS: [MacOS]
Python version: [3.10.6]

Logs

[2023-05-23 22:53:15,330 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-05-23 22:53:15,330 - bdfr.connector - Level 9] - Created download filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created time filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Created sort filter
[2023-05-23 22:53:15,331 - bdfr.connector - Level 9] - Create file name formatter
[2023-05-23 22:53:15,331 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Created site authenticator
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-05-23 22:53:15,332 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved user data
[2023-05-23 22:53:15,744 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-05-23 22:53:27,185 - bdfr.downloader - DEBUG] - Attempting to download submission 13nbj2e
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Attempting to download submission 13jz8vx
[2023-05-23 22:56:28,016 - bdfr.downloader - DEBUG] - Using Imgur with url https://i.imgur.com/XpO4ZNm.gifv
[2023-05-23 22:56:28,293 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 60 seconds: Response code 429
[2023-05-23 22:57:28,485 - bdfr.resource - WARNING] - Error occured downloading from https://i.imgur.com/XpO4ZNm.mp4, waiting 120 seconds: Response code 429
[2023-05-23 22:59:28,586 - bdfr.resource - ERROR] - Max wait time exceeded for resource at url https://i.imgur.com/XpO4ZNm.mp4
[2023-05-23 22:59:28,586 - bdfr.downloader - ERROR] - Failed to download resource https://i.imgur.com/XpO4ZNm.mp4 in submission 13jz8vx with downloader Imgur: Could not download resource: Response code 429

electricpollution · Answer 1 · Wed May 24 2023 11:56:17 GMT+0800 (China Standard Time)

Having the same issue

GarethFreeman · Answer 2 · Thu May 25 2023 06:27:22 GMT+0800 (China Standard Time)

Imgur is nuking all NSFW content from reddit, not sure if this can be fixed but that's most likely the cause. It must be scrweing with the API.

gemini0x2 · Answer 3 · Thu May 25 2023 07:53:10 GMT+0800 (China Standard Time)

@GarethFreeman I know about that, but if thats the cause for response code 429 then why we can still access any content in the browser without any problem? that makes no sense, unless somehow they detect something peculiar on how bdfr is making the download requests.

Serene · Answer 4 · Thu May 25 2023 09:30:05 GMT+0800 (China Standard Time)

HTTP code 429 is a rate limiting error code. It means that Imgur has received too many requests from the browser/application. There's not really any way for us to deal with this or get around it. It just means that you have to be slower or get less Imgur posts.

E. A. Wooten · Answer 5 · Sat May 27 2023 11:42:19 GMT+0800 (China Standard Time)

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.

    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...

I expect it's the accept more than the user-agent, but haven't tried without both.

gemini0x2 · Answer 6 · Sun May 28 2023 02:52:33 GMT+0800 (China Standard Time)

Adding curl to the headers fixed the issue. Too bad I didn't figured this out sooner. Thanks, @eawooten for the solution!

GGaroufalis · Answer 7 · Sun May 28 2023 04:16:49 GMT+0800 (China Standard Time)

I was having the same issue but noticed that curl was able to download the same urls with no problem.

Adding the curls default headers to the resource method below fixed the issue for me.
    @staticmethod
    def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
        # headers = download_parameters.get("headers")
        headers = {
            "user-agent": "curl/7.84.0",
            "accept": "*/*"
        }
        ...
I expect it's the accept more than the user-agent, but haven't tried without both.

Thank you! Adding curl fixes the issues with imgur, but breaks redgifs.

gemini0x2 · Answer 8 · Sun May 28 2023 04:37:13 GMT+0800 (China Standard Time)

@GGaroufalis Right, I didn't noticed that! A conditional statement will help.

GGaroufalis · Answer 9 · Sun May 28 2023 08:30:49 GMT+0800 (China Standard Time)

@Gavriik I think this one fixes it

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.redgifs.com"):
         headers = download_parameters.get("headers")
     else:
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

you need to add

import urllib.parse
from urllib.parse import urlparse

and

import fnmatch

at the top

Soulsuck24 · Answer 10 · Sun May 28 2023 09:47:17 GMT+0800 (China Standard Time)

Not sure why you wouldn't put it here rather than make the download function super janky...

gemini0x2 · Answer 11 · Sun May 28 2023 10:15:59 GMT+0800 (China Standard Time)

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.

def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

gemini0x2 · Answer 12 · Sun May 28 2023 10:20:15 GMT+0800 (China Standard Time)

Not sure why you wouldn't put it here rather than make the download function super janky...

@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.

Soulsuck24 · Answer 13 · Sun May 28 2023 21:37:13 GMT+0800 (China Standard Time)

@Soulsuck24 That does not work. If I'm not wrong that header is only used to retrieve the direct links of an imgur posts. It is not used for the actual download.

You're right, I was thinking this one instead, my bad.

Weird though if your connection to the API and through a browser is working but the script is getting 429s on the direct link download. The changes here are just changing the downloader from using the default requests user-agent to the curl one. This would be the first I've seen them limiting on something other than IP, but it's then not solely the user-agent as the requests one is used to access the API and it's not getting 429s. Odd.

GarethFreeman · Answer 14 · Mon May 29 2023 02:02:32 GMT+0800 (China Standard Time)

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }

How do you implement this? Apologies for not being an experienced coder.

GGaroufalis · Answer 15 · Mon May 29 2023 04:27:18 GMT+0800 (China Standard Time)

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }
How do you implement this? Apologies for not being an experienced coder.

rename the attached to "resource.py" and drop it in the bdfr folder
resource.txt

GarethFreeman · Answer 16 · Mon May 29 2023 06:23:10 GMT+0800 (China Standard Time)

@GGaroufalis thanks! I can confirm that the conditional statement is working as expected, but wouldn't it be better to switch the condition? that way the modified header is only used for imgur, and not for all other sites.
def http_download(url: str, download_parameters: dict) -> Optional[bytes]:
     headers = download_parameters.get("headers")
     domain = urlparse(url).hostname
     if fnmatch.fnmatch(domain, "*.imgur.com"):
         headers = {
             "user-agent": "curl/8.1.1",
             "accept": "*/*"
         }
How do you implement this? Apologies for not being an experienced coder.
rename the attached to "resource.py" and drop it in the bdfr folder resource.txt

Do you still use the bdfr function or is curl different? Could you provide an example of a reddit user download?

gemini0x2 · Answer 17 · Mon May 29 2023 06:25:38 GMT+0800 (China Standard Time)

@GarethFreeman same, just make sure the modified resource.py is in the right location.

GarethFreeman · Answer 18 · Mon May 29 2023 06:52:19 GMT+0800 (China Standard Time)

@Gavriik C:\Users\AppData\Local\BDFR\bdfr right? It's still giving me the 429 response code.

GGaroufalis · Answer 19 · Mon May 29 2023 06:55:49 GMT+0800 (China Standard Time)

@GarethFreeman mine is in C:\Users\Administrator\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr

yours might differ a bit depending on your python version

GarethFreeman · Answer 20 · Mon May 29 2023 07:07:25 GMT+0800 (China Standard Time)

@Gavriik I literally don't have that folder at all. I'm on 310, I just don't understand the problem. There are no other folders where I can place that file.

gemini0x2 · Answer 21 · Mon May 29 2023 07:25:00 GMT+0800 (China Standard Time)

The following command should give you the correct location:
python3 -m pip show bdfr

GarethFreeman · Answer 22 · Mon May 29 2023 07:48:23 GMT+0800 (China Standard Time)

@Gavriik Finally got it working, thanks for all the help mate.