403 forbidden error when hosting script
Nevrai opened this issue · comments
Describe the bug
I love duckduckgo-search
, but I’ve been having issues with fetching images when hosting my script on Cybrancee. My script uses Python 3.10.12.
Whilst using the duckduckgo-search
library to fetch images from DuckDuckGo, I encounter a HTTPError 403 Client Error: Forbidden for url error. This issue does not occur when running the bot locally – only when hosted on Cybrancee, which uses a Pterodactyl panel. Scraping web pages or search engines works fine, and fetching search results with duckduckgo-search
works fine, too. Fetching images is the only thing that does not work.
I also tried proxies, headers, and a user agent. However, I still have the same problem.
For some odd reason, I’m able to scrape DuckDuckGo search results with duckduckgo-search
just fine on my host:
ddg_link = DDGS(headers=new_headers, proxies=proxies, timeout=15).text(q)
However, when scraping image results instead, it does not work. Code:
rand_ua = get_ua()
logging.debug(f'[ddg_img.py] User agent: {rand_ua}')
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Dnt": "1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": rand_ua,
}
ddgs = DDGS(headers=headers, proxies=proxies, timeout=15)
async def get_ddg_image(query, first=False):
# Remove all punctuation marks except hyphens from query
query = re.sub(r'[^\w\s-]', '', query)
logging.debug(f'[get_ddg_image()] query: {query}')
keywords = query
ddgs_images_gen = ddgs.images(
keywords,
region='wt-wt',
safesearch='On'
)
# Get random image
images = list(itertools.islice(ddgs_images_gen, 10))
if first:
# Get the first image
image = images[0] if images else None
else:
# Get random image
image = random.choice(images) if images else None
May be related to #100; however, unlike that issue, it does not happen periodically for me. It happens with every attempt – but only when hosting, not when running the script locally.
I was using version 3.2.0 of duckduckgo-search
, then updated to the latest version, 3.8.3. However, the issue still occurs in the same way it did before.
I have seen #84 and #98. However, you (@deedy5) said that updating might fix it, but it did not. You also said that it’s not a library problem and that a proxy or increasing the time between requests might fix the issue, but in my case, it occurs every time, even if I haven’t made any recent requests, and I have tried both with and without proxies.
The strange anomaly is that it functions perfectly locally but not when hosting on Cybrancee (I have not tried other hosts) – and that using the same library to scrape DDG search results works perfectly with the same headers, UA, and proxies, but when trying to get images, it does not work. I’m not sure what is causing this, but if you could offer some assistance in fixing this issue, it would be much appreciated, as I am quite lost!
Errors
WARNING:duckduckgo_search.duckduckgo_search:_get_url() https://duckduckgo.com/i.js HTTPError 403 Client Error: Forbidden for url: https://duckduckgo.com/i.js?l=wt-wt&o=json&s=0&q=Potato+picture&vqd=4-7287769708002951745556569444305599608&f=%2C%2C%2C%2C%2C&p=1
ERROR:__main__:Unhandled error in on_message
Traceback (most recent call last):
File "/home/container/.local/lib/python3.10/site-packages/discord/client.py", line 441, in _run_event
await coro(*args, **kwargs)
File "/home/container/script.py", line 6134, in on_message
image_query, image_url, image_title = await fetch_image(msg, ai_response, server_id, channel_id, should_fetch, fetch_image_type)
File "/home/container/script.py", line 2799, in fetch_image
image_url, image_title = await get_ddg_image(image_query)
File "/home/container/ddg_img.py", line 73, in get_ddg_image
images = list(itertools.islice(ddgs_images_gen, 10))
File "/home/container/.local/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py", line 230, in images
resp = self._get_url("GET", "https://duckduckgo.com/i.js", params=payload)
File "/home/container/.local/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py", line 69, in _get_url
raise ex
File "/home/container/.local/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py", line 64, in _get_url
resp.raise_for_status()
File "/home/container/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://duckduckgo.com/i.js?l=wt-wt&o=json&s=0&q=Potato+picture&vqd=4-7287769708002951745556569444305599608&f=%2C%2C%2C%2C%2C&p=1
Information
- Environment: Cybrancee (Pterodactyl panel)
duckduckgo-search
version: 3.8.3 (latest)
I solved the issue myself by making sure to use the latest version of duckduckgo-search
, 3.8.3, and making sure I was using version 23.1.0 of aiofiles
. I also made sure I was using the latest versions of click
, httpx
, and lxml
. Thankfully, that solved it!
Hi,
Got the same error with 3.8.5 version. Is this really working anymore or the restrictions are too strict ?
I'm trying with only one image so no issues of query frequency here.
Use the latest version
I'm in Kaggle and 3.8.5 seems to be the best possible to be installed. Too bad. Thanks for the feedback.
I write this in case someone has a workaround for this