example for twitter topic

Question

example for twitter topic

rachmadaniHaryono opened this issue 2 years ago · comments

rachmadani haryono commented 2 years ago

i just tried this program to get twitter topic

here is the final result

from twitter_scraper_selenium.keyword import Keyword
URL = 'https://twitter.com/i/topics/1415728297065861123'
headless = False
keyword = 'steamdeck'
browser = 'firefox'
keyword_bot = Keyword(keyword, browser=browser, url=URL, headless=headless, proxy=None, tweets_count=1000)
data = keyword_bot.scrap()
with open('steamdeck.json', 'w') as f:
    json.dump(json.loads(data), f, indent=2)

#  print result
import textwrap
width = 120
for item in sorted(list(json.loads(data).values()), key=lambda x: x['posted_time']):
    wrap_text = '\n'.join(textwrap.wrap(item['content'], width=width))
    print(f"{item['posted_time']} {item['tweet_url']}\n{wrap_text}")
    print('-'*width)

some note on this

i got error when initializing webdriver similar to scrapy/scrapy#5635
- pip install 'pyOpenSSL==22.0.0' should fix it from linked issue
- this is little bit confusing because all import error is catch with general exception see also example 1 below
  - if possible just let error happened and end the program
save json will replace old data, so be careful
- it is possible to update json data by load the data first if file exist
- the same thing happen with csv
selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer
any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

example 1

try:
 # assume error on this line because import webdriver failed
 from inspect import currentframe
except Exception as ex:
 print(ex)

# error happened again because currentframe is not imported
frameinfo = currentframe()

Sajid Shaikh · Answer 1 · Sat Oct 01 2022 15:45:24 GMT+0800 (China Standard Time)

Thanks for the review @rachmadaniHaryono.

this is little bit confusing because all import error is catch with general exception see also example 1 below
if possible just let error happened and end the program

Yeah, moving import outside of try/catch will help catch the bug.

save json will replace old data, so be careful

The behaviour to write data in write mode is definitely misleading, this issue created misunderstanding for the user of my other library. Here's the issue. That's the feature I was even thinking of implementing to check if the file already exists and switch writing mode.

selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer

Yeah, Selenium can use a custom profile but I don't think It is going to help much in scraping as long as you're scraping unauthenticated way. Is it going to help in any way?

any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

The output will be invalid JSON.

rachmadani haryono · Answer 2 · Sat Oct 01 2022 19:14:49 GMT+0800 (China Standard Time)

Yeah, Selenium can use a custom profile but I don't think It is going to help much in scraping as long as you're scraping unauthenticated way. Is it going to help in any way?

i'm thinking of scraping my twitter frontpage with this and use some firefox extension while scraping the data

maybe i will create pr to scrap twitter topic later

Sajid Shaikh · Answer 3 · Sat Oct 01 2022 20:16:12 GMT+0800 (China Standard Time)

Oh, Okay. Thanks