Limits for scraping followers?

Question

Limits for scraping followers?

zeglin opened this issue 5 years ago · comments

Are there any limits for scraping followers? If yes, how to get around them...?

I am trying to scrape followers for profiles that have around 30k - 50k followers each. At the beginning it's working smoothly, but after scraping 198 accounts it freezes and stops. When i CTRL+C and try to launch the script again, i can see this error:

0.6459672451019287
0.6380813121795654
0.6969287395477295
0.5864517688751221
10:55^CERROR [2019-03-09 10:55:34] Follower is empty
INFO [2019-03-09 10:55:34] alias name: Surfakademin
INFO [2019-03-09 10:55:34] bio: Surfakademin is a passion driven surf travels company with surfcamps all over the world, year around. The summer never ends. Live. Love. Surf.
INFO [2019-03-09 10:55:34] url: www.surfakademin.se
INFO [2019-03-09 10:55:34] Posts: 1541
INFO [2019-03-09 10:55:34] Follower: 17603
INFO [2019-03-09 10:55:34] Following: {'count': 1723}
INFO [2019-03-09 10:55:34] isPrivate: False
Error with user surfakademin
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

When i re-launch the script, this is what i get:

user@vps:/var/crawl# python3 crawl_profile.py surfakademin
Extracting information from surfakademin
security code accepted
logged in
INFO [2019-03-09 10:39:04] Extracting information from surfakademin
INFO [2019-03-09 10:39:05] Extracting follower from surfakademin
Unexpected error: <class 'IndexError'>
list index out of range
ERROR [2019-03-09 10:39:36] Cannot get Follower List

Ray · Answer 1 · Fri Sep 20 2019 13:12:36 GMT+0800 (China Standard Time)

Anyone know what's wrong w this issues. I also got same problem.

andreagiugio · Answer 2 · Thu Feb 27 2020 20:57:53 GMT+0800 (China Standard Time)

I also getting the same problem. Can someone help?