taspinar / twitterscraper

Scrape Twitter for Tweets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraper does not collect tweets when querying specifically for 'covid' or 'Covid'

erb13020 opened this issue · comments

I'm trying to gather a dataset of tweets containing the word 'covid' using this library. I've been using this library for a while and never had any issue but when I search specifically for 'covid', I am not able to scrape any tweets. It works when I try to query for coronavirus, bitcoin, mcdonalds, etc - just not when I search for 'covid'. This is what my output looks like.

https://gyazo.com/6fb6bd3a9dc85ff912b087a456b371c0

I also put this in my code before I even had this issue,

HEADERS_LIST = [ 'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13', 'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201', 'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16', 'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre' ]

so I'm sure that my issue isn't related to #316 or #296

Here is what one of the query urls looks like in the console output when I run my program

https://twitter.com/search?f=tweets&vertical=default&q=covid%20since%3A2020-02-26%20until%3A2020-02-27&l=

My guess is that the 'Know the Facts' popup is preventing the scraper from querying 'covid' tweets properly, because my program does work with any other search term.

I'm not sure if this is helpful, because my file is 140 lines of code, but here is the function that gets called when I need to scrape. Sorry that the formatting is bad.

def scrape(y, m, query):
'''
Returns a dataframe containing all tweets and metadata for a query in a given month and filters for only English tweets.

        Parameters:
                y (int): A 4 digit integer representing the year.
                m (int): A 2 digit integer representing the month.
                query (str): The twitter query.

        Returns:
                df (DataFrame): DataFrame containing all tweets and metadata for a query.
'''
d = __calculate_days(m, y)
begin_date = dt.date(y, m, 1)
end_date = dt.date(y, m, d)

tweets = query_tweets(query, begindate=begin_date, enddate=end_date, poolsize=d)

df = pd.DataFrame(t.__dict__ for t in tweets)

df['lang'] = df['text'].apply(lambda x: detector(x))
df = df[df['lang'] == 'en']

return df

Any thoughts or hints?