opsdisk / yagooglesearch

Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a way to turn off cool_off_time and make search request fail instead of retry?

Cyber-Cowboy opened this issue · comments

Thanks for the great library, but as far as I understand it does not provide a way to handle 429 response by yourself(or at least I didn't find one) and it's just trying to make another request after certain cool_off time. It would be great if there was some parameter like "retry" that could be passed to client something like "retry=False" to make it raise an Error if 429 response was received.

Hi @Cyber-Cowboy - thanks for submitting an issue! So if an HTTP 429 was detected, instead of cooling off before trying again, you'd want it check if, for example, 429_retry=False, and if that's the case, bail on the rest of search and return that a 429 was detected? So basically, yagooglesearch will return to your calling script and say "HTTP 429 detected, I'm done, it's up to you to determine the next step"? If I'm not fully understanding the ask, please let me know and provide more details for your use case.

Hi @opsdisk, Yeah, pretty much this way. I need it because I am able to switch my proxies and if 429 code was returned I would prefer to make another request using different proxie instead of waiting for n minutes.

Gotcha...might be a few days until I can get to it. Just FYI, here is how you can use more than 1 proxy to spread the search (https://github.com/opsdisk/yagooglesearch#multiple-proxies). If you have enough, you likely won't run into HTTP 429s (not guaranteed though 😄 )

@Cyber-Cowboy Check out #8 and take it for a spin.

When instantiating the yagooglesearch object, pass yagooglesearch_manages_http_429s=False. If a 429 is detected, it will return to your calling script with a string "HTTP_429_detected". At that point, it's up to your script to adjust.

@opsdisk thanks, It looks exactly like what I need!

Great! I didn't get a chance to test it yet. I'll check back in a few days to see if it satisfies your ask.

Had to push an update for it to work properly. My testing pastables:

import yagooglesearch

query = "site:twitter.com"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    verbosity=4,
    num=10,
    max_search_result_urls_to_return=200,  # Trigger HTTP 429
    minimum_delay_between_paged_results_in_seconds=1,  # Trigger HTTP 429
    yagooglesearch_manages_http_429s=False,  # Trigger HTTP 429
)
client.assign_random_user_agent()

urls = client.search()

if "HTTP_429_DETECTED" in urls:
    print("HTTP 429 detected...it's up to you to modify your search.")

    # Remove HTTP_429_DETECTED from list.
    urls.remove("HTTP_429_DETECTED")

    print("URLs found before HTTP 429 detected...")

    for url in urls:
        print(url)

@Cyber-Cowboy Merged #8 into master. Let me know if you run into any bugs or issues.