Nv7-GitHub / googlesearch

A Python library for scraping the Google search engine.

Home Page:https://pypi.org/project/googlesearch-python/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some requests break the search

arditobryan opened this issue · comments

I find out there are some queries that make the search function stall for over 1 minute, then they return 429, regardless of waiting time.
Ex. "Malaysia sugar tax, RM0.40 (US$0.086) per litre, more than 5 grams/100ml" takes a few seconds to retrieve the first 2 links, but at the 3rd, it makes me wait 1:30 mins, then returns 429, and the IP is unusable.
I tried the same query on Google Colab (that should not use my IP), yet, to be sure, I also tried switching internet connection to the phone hotspot and using an EC2, and all lead to the same results (breaking at the 3rd link of the same query): some queries can break the algorithm

Ideally, we should use the timeout params for the requests, but (I tried) it does not work in the case above.
While adding delays or user-agent can help prevent the 429 as a whole, I think this specific issue still needs to be addressed.

commented

When I search this up on google there are no results. Do you think this could be causing the issue?

The same behavior is observed for me as well by running the script in google colab, local system, AWS ec2 instance.
My search queries are phone model names like "iPhone 14" or "Google pixel 7 pro"

Yep, I'm also getting a 429 error as well. I have limited the number of returns hoping that might resolve the issue. I have done a search on something as simple as "time".

Edited
It's throwing up a captcha. Is there any suggested timeout that people are having luck with?