5hirish / tweet_scrapper

Scrape the Twitter frontend API without any authentication and restriction.

Home Page:http://www.shirishkadam.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scrape Profile Image URL

farisalasmary opened this issue · comments

I've been using this library for a while but unfortunately I did not find profile image URL within the scraped data. I've struggled to modify the code but with no result! My real problem is with twitter's class name obfuscation. For example, class="css-1dbjc4n r-1j3t67a" is the CSS class used inside the div of each tweet but in your code it is as simple as

_tweet_content_pattern_ = '''./div[@class="content"]'''
. How could you know the real name of the class? also, how can you add a new feature like profile image URL?

@farisalasmary this library uses XPATH to scrape data. So to get the profile picture image one could use: //*[@id="page-container"]/div[1]/div/div[1]/div[2]/div[1]/div/a/img XPATH query. You can even further simplify this XPATH query. If you do add this please raise a PR and I will merge it.