What is the exact rate limit of DDG?

Question

What is the exact rate limit of DDG?

heesuju opened this issue 4 months ago · comments

Hello,

I'm using duckduckgo_search version 5.1.0.
In my code, I'm using "AsyncDDGS.text(keyword)" in a for loop.
I'm iterating through the loop with an interval of 10 seconds using "await asyncio.sleep(10)".

However, after 5-6 search requests, I'm getting a rate limit error.
Any subsequent requests trigger a rate limit error as well for about 10-15 minutes.
Until now I thought that the rate limit was 1-2 request per 10 seconds, but this doesn't seem to be the case.

Is there a total amount of requests that I'm allowed to send?
Any help would be appreciated.
Thank you in advance.

deedy5 · Answer 1 · Fri Mar 22 2024 20:40:41 GMT+0800 (China Standard Time)

Hi, show me the code.

phamxtien · Answer 2 · Sat Mar 23 2024 00:55:06 GMT+0800 (China Standard Time)

After upgrade to 5.1.0 it returns error

_aget_url() https://duckduckgo.com RequestsError: Impersonating BrowserType.chrome120 is not supported

deedy5 · Answer 3 · Sat Mar 23 2024 04:12:54 GMT+0800 (China Standard Time)

@phamxtien
Some kind of problem with curl-cffi.
Try to reinstall duckduckgo_search:
pip install -I duckduckgo_search

phamxtien · Answer 4 · Sat Mar 23 2024 06:31:05 GMT+0800 (China Standard Time)

@phamxtien Some kind of problem with curl-cffi. Try to reinstall duckduckgo_search: pip install -I duckduckgo_search

I follow your guide, it still returns errror
But use cli it runs smoothly

My code

from duckduckgo_search import DDGS
from bs4 import BeautifulSoup

def ddgSearch(keywords, region='vn-vi', count=5):
    documents = []
    urls = []
    ddgs = DDGS()
    i = 1
    for keyword in keywords:
        icount = 1
        try: 
            for r in ddgs.text(keyword, region=region, safesearch='off', timelimit='y', max_results=count):
                print(r)
                try:
                    response = requests.get(r['href'])
                    soup = BeautifulSoup(response.text, 'html.parser')
                    body = soup.find('body').text
                    body = ' '.join(body.split())
                    documents.append(body)
                    urls.append(r['href'])
                    i = i + 1
                    icount = icount + 1
                    if icount > count: break
                except Exception as e:
                    print(str(e))
                    continue
        except Exception as e:
            print(str(e))
            continue
        time.sleep(6)
    return {'urls': urls, 'documents': documents}

and get error _aget_url() https://duckduckgo.com/ RequestsError: Impersonating BrowserType.chrome120 is not supported

Environment

OS: Ubuntu 23.10
Python: 3.11

deedy5 · Answer 5 · Sat Mar 23 2024 07:21:41 GMT+0800 (China Standard Time)

are you importing requests?

phamxtien · Answer 6 · Sat Mar 23 2024 09:31:05 GMT+0800 (China Standard Time)

are you importing requests?

Yes, i import requests already
I miss it when create above comment

deedy5 · Answer 7 · Sat Mar 23 2024 13:07:21 GMT+0800 (China Standard Time)

I don't see the above error when I run your code.
Reinstall duckduckgo_search in the virtual environment from which you are running the code.

phamxtien · Answer 8 · Sat Mar 23 2024 13:55:52 GMT+0800 (China Standard Time)

I think this make it error

and I'm still stuck :(

heesuju · Answer 9 · Mon Mar 25 2024 09:11:46 GMT+0800 (China Standard Time)

Hi, show me the code.

Hello again, sorry for the late reply.
Here's my sample code.

I'm using the code from autogpt repository to get search results from duckduckgo_search
This code used to work fine until a week ago.
I think my IP might be blocked after making too many requests? (I used to use multi-threading to run like 20 requests at once)
Now I get a rate limit error every 5-6 times I make a request.

import asyncio
import json
from itertools import islice
from duckduckgo_search import AsyncDDGS

async def web_search(query: str, num_results: int = 8) -> list[dict]:
    search_results = []
    attempts = 0

    while attempts < 3:
        if not query:
            return json.dumps(search_results)

        async with AsyncDDGS() as ddgs:
            results = await ddgs.text(query, safesearch='on', max_results=num_results, backend="html")
            search_results = list(islice(results, num_results))

        if search_results:
            break

        await asyncio.sleep(1)
        attempts += 1

    return search_results

async def main(url:str):
	keywords = ["keyword1", "keyword2", "keyword3", "keyword4", "keyword5"]
	for i in range(len(keywords)):
		results = await search_keyword(keywords[i], 10)
		await asyncio.sleep(10)

deedy5 · Answer 10 · Mon Mar 25 2024 17:52:07 GMT+0800 (China Standard Time)

Try to use backend='api', it's less likely to block.
I used to use multi-threading to run like 20 requests at once -> use a proxy.

heesuju · Answer 11 · Tue Mar 26 2024 08:36:23 GMT+0800 (China Standard Time)

Completely forgot to mention that I switched over to 'html' from 'api' after my ip started getting blocked.
I guess my only option is using proxies.
Thank you for the help!

iwo9 · Answer 12 · Sat Apr 20 2024 16:49:14 GMT+0800 (China Standard Time)

Hi - did using proxies resolve the issue? I'm having the same problem - used to work fine, now I keep getting the rate limit exception after 5-6 search runs. Have to wait a while before it can run properly again. Was trying to see if there's a way to actually pay for duckduckgo-search so that I can guarantee it'll work for what I need, but can't find that either.

deedy5 · Answer 13 · Mon Apr 22 2024 04:56:39 GMT+0800 (China Standard Time)

@iwo9
Just use a rotating proxy
https://github.com/deedy5/duckduckgo_search?tab=readme-ov-file#proxy