shaikhsajid1111 / twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily

Home Page:https://pypi.org/project/twitter-scraper-selenium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

example for twitter topic

rachmadaniHaryono opened this issue · comments

i just tried this program to get twitter topic

here is the final result

from twitter_scraper_selenium.keyword import Keyword
URL = 'https://twitter.com/i/topics/1415728297065861123'
headless = False
keyword = 'steamdeck'
browser = 'firefox'
keyword_bot = Keyword(keyword, browser=browser, url=URL, headless=headless, proxy=None, tweets_count=1000)
data = keyword_bot.scrap()
with open('steamdeck.json', 'w') as f:
    json.dump(json.loads(data), f, indent=2)

#  print result
import textwrap
width = 120
for item in sorted(list(json.loads(data).values()), key=lambda x: x['posted_time']):
    wrap_text = '\n'.join(textwrap.wrap(item['content'], width=width))
    print(f"{item['posted_time']} {item['tweet_url']}\n{wrap_text}")
    print('-'*width)

some note on this

  • i got error when initializing webdriver similar to scrapy/scrapy#5635
    • pip install 'pyOpenSSL==22.0.0' should fix it from linked issue
    • this is little bit confusing because all import error is catch with general exception see also example 1 below
      • if possible just let error happened and end the program
  • save json will replace old data, so be careful
    • it is possible to update json data by load the data first if file exist
    • the same thing happen with csv
  • selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer
  • any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

example 1

try:
 # assume error on this line because import webdriver failed
 from inspect import currentframe
except Exception as ex:
 print(ex)

# error happened again because currentframe is not imported
frameinfo = currentframe()

Thanks for the review @rachmadaniHaryono.

this is little bit confusing because all import error is catch with general exception see also example 1 below
if possible just let error happened and end the program

Yeah, moving import outside of try/catch will help catch the bug.


save json will replace old data, so be careful

The behaviour to write data in write mode is definitely misleading, this issue created misunderstanding for the user of my other library. Here's the issue. That's the feature I was even thinking of implementing to check if the file already exists and switch writing mode.


selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer

Yeah, Selenium can use a custom profile but I don't think It is going to help much in scraping as long as you're scraping unauthenticated way. Is it going to help in any way?


any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

The output will be invalid JSON.

Yeah, Selenium can use a custom profile but I don't think It is going to help much in scraping as long as you're scraping unauthenticated way. Is it going to help in any way?

i'm thinking of scraping my twitter frontpage with this and use some firefox extension while scraping the data


maybe i will create pr to scrap twitter topic later

Oh, Okay. Thanks