httpx.HTTPError on 3.9.5
GianfrancoCorrea opened this issue · comments
Yesterday someone reported this bug, but he deleted the issue, so i don't know if it has some solution or what...
code:
async def async_search(query):
try:
async with AsyncDDGS() as ddgs:
results = [r async for r in ddgs.text(query, max_results=5)]
return results
except Exception as e:
print(e)
return []
async def search_queries(queries):
tasks = []
for query in queries:
tasks.append(asyncio.create_task(async_search(query)))
results = await asyncio.gather(*tasks)
return results
Debug log
2023-11-15 09:52:35.202 Uncaught app exception
Traceback (most recent call last):
File "/Users/gianjsx/Documents/fuentes/.venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.__dict__)
File "/Users/gianjsx/Documents/fuentes/app.py", line 49, in <module>
asyncio.run(main())
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/gianjsx/Documents/fuentes/app.py", line 46, in main
for result in results:
File "/Users/gianjsx/Documents/fuentes/.venv/lib/python3.11/site-packages/duckduckgo_search/duckduckgo_search.py", line 96, in text
for i, result in enumerate(results, start=1):
File "/Users/gianjsx/Documents/fuentes/.venv/lib/python3.11/site-packages/duckduckgo_search/duckduckgo_search.py", line 148, in _text_api
resp = self._get_url("GET", "https://links.duckduckgo.com/d.js", params=payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/gianjsx/Documents/fuentes/.venv/lib/python3.11/site-packages/duckduckgo_search/duckduckgo_search.py", line 55, in _get_url
raise ex
File "/Users/gianjsx/Documents/fuentes/.venv/lib/python3.11/site-packages/duckduckgo_search/duckduckgo_search.py", line 48, in _get_url
raise httpx._exceptions.HTTPError("")
httpx.HTTPError
Specify this information
- ddgs version 3.9.5
- venv
- macbook pro m2
Try using a proxy. The package works, all tests pass. Maybe your ip is blocked by the site.
Same issue here except I'm not using the async variant - and it seems very intermittent.
Admittedly not a large sample size but it mainly occurs when I'm handling multiple inbound http requests - some resource sharing issue maybe but what do I know :)
@deedy5 if it is actually to do with being blocked, could we have a nice way to handle this? I might be a bit of a noob but not exactly sure how to catch these exceptions...
Awesome work btw :)
If you send multiple requests in parallel, the site will block your ip for a while.
The solution is simple - either send requests sequentially in one stream, or use a proxy so that your ip is different for each request. https://github.com/deedy5/duckduckgo_search#using-proxy
Okay, is there a way to catch and handle these exceptions, at the moment it's happening behind the scenes
- any resources on how the duckduckgo throttling mechanics work?
Try using a proxy. The package works, all tests pass. Maybe your ip is blocked by the site.
today I started to see same errors.
Sequential queries with DDGS()
(not async). After 2-3 requests with interval of <10s API starts to respond with 202
and this causes HTTPError
.
There's no mention in logs why _is_500_in_url(str(resp.url)) or resp.status_code == 202
added in a first place and can't find what does 202
mean at DDG. Is there a more graceful way to handle it, not just raising error after 2 quick retries?
Have the same issue which happens ocasionaly without frequent request (1 request per 10-20min)
Edt: I was wrong, it sends 4-5 rquests in a row once in 10-20min
Also tested over CLI, ddg respond with 202 on 3-4 request in a row
I've build a workaround for the limit
Looks like the limit is ~2request per 10sec
import asyncio
from duckduckgo_search import AsyncDDGS
class AsyncRateLimitedActionWrapper:
def __init__(self, rate_limit: int, time_period: float):
"""
:param rate_limit: The maximum number of requests allowed per time period.
:param time_period: Time period in seconds over which the rate limit applies.
"""
self.rate_limit = rate_limit
self.time_period = time_period
self.slots = asyncio.Queue(maxsize=rate_limit)
self.history = asyncio.Queue()
self.generator = None
async def _generate_slots(self):
"""
Generate slots for the rate limit.
"""
while True:
if self.slots.empty() or self.history.empty():
# no calls in the time period window
await asyncio.sleep(self.time_period)
else:
# first call in the time period window
first_time_call = await self.history.get()
self.history.task_done()
current_time = asyncio.get_event_loop().time()
await asyncio.sleep(self.time_period - current_time + first_time_call)
await self.slots.get() # put back the slot once call is out of the framed time period
self.slots.task_done()
async def _consume_slot(self):
"""
Consume a slot.
"""
if self.generator is None:
self.generator = self._generate_slots()
asyncio.create_task(self.generator)
await self.slots.put(1)
await self.history.put(asyncio.get_event_loop().time())
async def perform(self, action: callable, *args, **kwargs):
"""
Asynchronously make an action, respecting the rate limit.
"""
await self._consume_slot()
return await action(*args, **kwargs)
limit_wrapper = AsyncRateLimitedActionWrapper(2, 10)
async def _search(search_query):
async with AsyncDDGS() as ddgs:
results = [r async for r in ddgs.text(search_query)]
return results
async def search(search_queries) -> list[dict]:
return await limit_wrapper.perform(_search, search_queries)
Thank you all for finding the error.
The site sometimes makes changes.
Fixed in version v3.9.6.
thanks, it mostly works. Started to fail after ~10 min (with a rate of 3 useful reqs per 10s):
11:11:00: HTTP Request: POST https://duckduckgo.com "HTTP/2 200 OK"
11:11:00: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=0&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=50&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=100&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=150&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=200&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=250&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=300&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=350&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=400&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=450&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=500&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
what does that s
mean? Perhaps we need to apply smarter backoff instead of just 10 immediate repeats?
Try using a proxy. 202 reponse code is their way of blocking ip at the moment.
s
is pagination.
I tested v3.9.6, text() function with 100 random keywords.
There are no 202 responses at all.
Working in a single thread does not cause errors.
But if you work in multithreaded mode, errors will occur.
In this case you should use a proxy.