whusnoopy / renrenBackup

A backup tool for renren.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

出現"urlopen chunked=chunked"錯誤

sfs00784 opened this issue · comments

Traceback (most recent call last):
File "C:\Users\xxx.virtualenvs\renrenBackup-master-RLOICigy\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\xxx.virtualenvs\renrenBackup-master-RLOICigy\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "C:\Users\xxx.virtualenvs\renrenBackup-master-RLOICigy\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\http\client.py", line 1321, in getresponse
response.begin()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\http\client.py", line 296, in begin
version, status, reason = self._read_status()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\http\client.py", line 257, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\socket.py", line 589, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [WinError 10054] 远端主机已强制关闭一个现存的连线。

Traceback (most recent call last):
File "fetch.py", line 129, in
fetched = fetch_user(fetch_uid, cmd_args)
File "fetch.py", line 98, in fetch_user
fetch_album(uid)
File "fetch.py", line 71, in fetch_album
album_count = crawl_album.get_albums(uid)
File "I:\renrenBackup-master\crawl\album.py", line 118, in get_albums
total += get_album_list_page(cur_page, uid)
File "I:\renrenBackup-master\crawl\album.py", line 106, in get_album_list_page
get_album_summary(aid, uid)
File "I:\renrenBackup-master\crawl\album.py", line 66, in get_album_summary
'src': get_image(p['large']),
File "I:\renrenBackup-master\crawl\utils.py", line 31, in get_image
resp = crawler.get_url(img_url)
File "I:\renrenBackup-master\crawl\crawler.py", line 97, in get_url
return self.get_url(url, params, method, retry)
File "I:\renrenBackup-master\crawl\crawler.py", line 97, in get_url
return self.get_url(url, params, method, retry)
File "I:\renrenBackup-master\crawl\crawler.py", line 97, in get_url
return self.get_url(url, params, method, retry)
[Previous line repeated 2 more times]
File "I:\renrenBackup-master\crawl\crawler.py", line 82, in get_url
raise Exception("network error, exceed max retry time")
Exception: network error, exceed max retry time

问题成因:
运行速度太快,的的urlopen打开网页太过频繁,引起人人网的怀疑,被认定为是攻击行为,导致执行的的urlopen()后卡死,最后抛出异常。

解决方案:
加大睡眠time.sleep的时间(我增加到60秒),默认的1秒太短了。

参考号码:
https://blog.csdn.net/illegalname/article/details/77164521

这个在测我自己和朋友的数据时没有出现,可能还是我们的数据量不够大,还不足以触发对应的防抓逻辑(看贴出来的信息应该是抓相册图片的时候导致)

这个可以自己改一下超时参数,或者把 time.sleep 间隔跳跃性调大,先 1 秒,报错后逐步增大间隔时间,后续有空时会优化一下

这个问题我在自己这边改了一下代码,把sleep时间改成等比数列增长,目前运行正常。

@xuan-w 求 PR,懒得自己改了(虽然看起来就是把 crawl/crawler.py:96 里的 retry 改成 1 << retry2 ** retry 就行)


还是自己改了 :)

@xuan-w 求 PR,懒得自己改了(虽然看起来就是把 crawl/crawler.py:96 里的 retry 改成 1 << retry2 ** retry 就行)

还是自己改了 :)

抱歉啊,昨天我写完这个就睡觉去了