error
daixiangzi opened this issue · comments
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)
最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)
unfortunately it does not work for me...
Exceeded 30 redirects.
Exceeded 30 redirects.
Exceeded 30 redirects.
Exceeded 30 redirects.
Exceeded 30 redirects.
Exceeded 30 redirects.
Exceeded 30 redirects.
== 0 out of 0 crawled images urls will be used.```
Ok, there's another line at 215 that needs to be changed. So overall this will work:
-195 res = requests.get(init_url, proxies=proxies)
+195 res = requests.get(init_url, proxies=proxies, headers=headers)
-215 response = requests.get(url, proxies=proxies)
+215 response = requests.get(url, proxies=proxies, headers=headers)
最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)unfortunately it does not work for me...
Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. == 0 out of 0 crawled images urls will be used.```
大佬麻烦问一下你这个改动是加在哪个地方啊?
最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)unfortunately it does not work for me...
Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. Exceeded 30 redirects. == 0 out of 0 crawled images urls will be used.```
大佬麻烦问一下你这个改动是加在哪个地方啊?
Refer to my earlier add-on, line 215 also need to change
-215 response = requests.get(url, proxies=proxies)
+215 response = requests.get(url, proxies=proxies, headers=headers)
Fixed in 7013bfd
@ald2004 @mapattacker Thanks for the fix code.