taspinar / twitterscraper

Scrape Twitter for Tweets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

twitter scrapper error

mahajnay opened this issue · comments

Hi all,

While using twitter scrapper,

I have this code

from twitterscraper import query_tweets
import datetime as dt
import pandas as pd

begin_date = dt.date(2020,3,1)
end_date = dt.date(2021,11,1)

limit = 100
lang = 'english'

tweets = query_tweets('vaccinesideeffects', begindate = begin_date, enddate = end_date, limit = limit, lang = lang)
df = pd.DataFrame(t.dict for t in tweets)

df = df['text']

df

Getting below error


AttributeError Traceback (most recent call last)
in
----> 1 from twitterscraper import query_tweets
2 import datetime as dt
3 import pandas as pd
4
5 begin_date = dt.date(2020,3,1)

~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/init.py in
11
12
---> 13 from twitterscraper.query import query_tweets
14 from twitterscraper.query import query_tweets_from_user
15 from twitterscraper.query import query_user_info

~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/query.py in
74 yield start + h * i
75
---> 76 proxies = get_proxies()
77 proxy_pool = cycle(proxies)
78

~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/query.py in get_proxies()
47 soup = BeautifulSoup(response.text, 'lxml')
48 table = soup.find('table',id='proxylisttable')
---> 49 list_tr = table.find_all('tr')
50 list_td = [elem.find_all('td') for elem in list_tr]
51 list_td = list(filter(None, list_td))

AttributeError: 'NoneType' object has no attribute 'find_all'

Same for me

Same issue here

It tries to grab table from https://free-proxy-list.net with id ='proxylisttable' but it doesnt exist.
You need to remove it from line 48 :
table = soup.find('table',id='proxylisttable')
to
table = soup.find('table')

fixed this error using Pandas:

    import pandas as pd 
    ...
    def get_proxies():    
    resp = requests.get(PROXY_URL)
    df = pd.read_html(resp.text)[0]
    list_ip=list(df['IP Address'].values)
    list_ports=list(df['Port'].values.astype(str))
    list_proxies = [':'.join(elem) for elem in list(zip(list_ip, list_ports))]

however, this still does not work.

list_of_tweets = query_tweets("Trump OR Clinton", 10) returns:

Exception: Traceback (most recent call last):
  File "/Users/rmartin/Desktop/Envs/crypto_env/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV) Job: 0.

Same error here on python 3.9

It tries to grab table from https://free-proxy-list.net with id ='proxylisttable' but it doesnt exist. You need to remove it from line 48 : table = soup.find('table',id='proxylisttable') to table = soup.find('table')

thanks, it solved my problem

@NafiGit How did you edit the code in their repository?