u0nel / tweety

Twitter Scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tweety

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely fast.

Downloads

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Internet Connection
  • Python 3.6+
  • BeautifulSoup (Python Module)
  • Requests (Python Module)
  • openpyxl (Python Module)
  • wget (Python Module) [Optional , you need to install it manually]

Table of Contents

Installation:

pip install tweety-ns

Exceptions

  • UserNotFound : Raised when the queried user not Found
  • GuestTokenNotFound : Raised when the script is unable to get the guest token from Twitter
  • InvalidTweetIdentifier : Raised when the getting the standalone tweet detail and the tweet identifier is invalid
  • UnknownError : Raised when the error occurs which is unknown to the module

Using tweety

Getting Tweets:

Description:

Get 20 Tweets of a Twitter User

Required Parameter:

  • Username or User profile URL while initiating the Twitter Object

Optional Parameter:

  • pages : int (default is 1,starts from 2) -> Get the mentioned number of pages of tweets
  • replies : boolean (default is False) - > should get replies from tweets too
  • wait_time : int (default is 2) - > seconds to wait between multiple requests

Output:

class UserTweets (iterable)

Example:

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> all_tweet = Twitter("Username or URL").get_tweets(pages=2)

Getting Trends:

Description:

Get 20 Locale Trends

Output:

List of class Trends Object

Example :

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> trends = Twitter().get_trends()
>>> for i in trends:
...   print(i.name)

Searching a keyword:

Description:

Get 20 Tweets for a specific Keyword or Hashtag

Required Parameter:

  • keyword : str -> Keyword begin search

Optional Parameter:

  • pages : int (starts from 2 , default is 1) -> number of pages to get
  • filter_ : str -> filter your search results for different types , check Search Filters
  • wait_time : int (default is 2) - > seconds to wait between multiple requests

Output:

class Search (iterable)

Example:

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> trends = Twitter().search("Pakistan")

Example with filter:

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> from tweety.filters import SearchFilters
>>> trends = Twitter().search("Pakistan",filter_=SearchFilters.Videos())

Getting USER Info:

Description:

Get the information about the user

Required Parameter:

  • Username or User profile URL while initiating the Twitter Object

Optional Parameter:

  • banner_extensions : boolean (Default is False) -> get more information about user banner image
  • image_extensions : boolean (Default is False) -> get more information about user profile image

Output:

class User

Example:

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> trends = Twitter("Username or URL").get_user_info()

Getting a Tweet Detail:

Description:

Get the detail of a tweet including its replies

Required Parameter:

  • Identifier of the Tweet: Either Tweet URL OR Tweet ID

Output:

class Tweet

Example:

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> trends = Twitter().tweet_detail("https://twitter.com/Microsoft/status/1442542812197801985")

Objects Type Classes

  • UserTweets

    This Object is Iterable and Subscriptable

    Representation:

    UserTweets(user=username, count=number_of_results)
    

    Methods:

    • to_xlsx -> this returns Nothing and create an Excel Sheet
      • 'filename' parameter can be optionally pass to to_xlsx method in order to set the filename of Excel file , if not passed the default name of Excel file will be tweet-{username}.xlsx
    • to_csv -> this returns Nothing and create an CSV Sheet
      • 'filename' parameter can be optionally pass to to_csv method in order to set the filename of CSV file , if not passed the default name of Excel file will be tweet-{username}.csv
    • to_dict -> this is return a tweet dict

    Attributes:

    • user -> User ID of the queried user
    • dict -> List of Tweet Results

    All the Tweets included in the result are of class Tweet

  • Tweet

    Representation

    Tweet(id=id_of_tweet , author=author_of_tweet, created_on=tweet_creation_time)
    

    Methods:

    • to_dict -> this is return a tweet dict

    Attributes:

    • id -> id of the tweet
    • author -> author of the tweet (class User)
    • created_on -> creation time of tweet
    • is_retweet -> is the tweet is retweet
    • is_reply -> is the tweet is reply
    • tweet_body -> content of tweet
    • language -> language of the tweet
    • retweet_counts -> number of retweets on the tweet
    • media -> list of media (class Media) add to the tweet
    • user_mentions -> list of users (class ShortUser) mentioned in the tweet
    • urls -> list of urls in the tweet
    • hashtags -> list of hashtags in the tweet
    • symbols -> list of symbols in the tweet
    • reply_to -> username of the user to which this tweet was reply to (if is_reply is true)
    • threads -> list of class Tweet associated with the tweet or None

    This Object is Iterable if the threads attribute is not None

  • Media

    Representation

    Media(id=id_of_media , type=type_of_media)
    

    Methods:

    • to_dict -> this is return a list of dict
    • download -> download the given media in the disk

      download method requires parameter filename

    Attributes:

    • id -> id of the media
    • display_url -> url of the media which is used for preview
    • expanded_url -> full url of the media which is used for preview
    • indices -> list of indices of tweet body at which the link was found
    • media_url_https -> full https url of the media
    • type -> type of the media
    • url -> url of the media
    • features -> features of the media
    • sizes -> size of the media
    • original_info -> original dimensions of the media preview
    • media_key -> internal key of the following media
    • mediaStats -> stats of the media (usually available only when the type of media is video)
    • file_format -> file format of the media if the type of media is photo else None
    • streams -> list of all the video types available class Stream (if the type of media is video)
  • ShortUser

    Representation

    ShortUser(id=id_of_user , name=name_of_user)
    

    Methods:

    • to_dict -> this is return a list of dict

    Attributes:

    • id -> id of the user
    • name -> name of the user
    • screen_name -> screen name of the user
  • User & UserLegacy

    Representation

    User(id=id_of_user , name=name_of_user , followers=follower_count_of_user , verified=is_user_verified)
    

    Methods:

    • to_dict -> this is return a list of dict

    Attributes:

    • id -> Alpha-Id of the user
    • rest_id -> Numeric id of the user
    • name -> name of the user
    • screen_name -> screen name of the user
    • created_at -> time to user creation
    • default_profile -> is this default_profile of the user
    • default_profile_image -> image of default_profile
    • description -> description of the user
    • entities -> entities of the user
    • fast_followers_count -> number of fast followers
    • favourites_count -> number of favourites of user
    • followers_count -> number of followers of the user
    • friends_count -> number of friends of the user
    • has_custom_timelines -> do user have custom_timelines
    • listed_count -> number of lists of user
    • location -> location of the user
    • media_count -> number of medias posted by the user
    • normal_followers_count -> number of normal_followers
    • protected -> is profile protected
    • statuses_count -> number of statuses posted by the user
    • verified -> is user verified
    • profile_url -> The profile link of the user
  • Search

    This Object is Iterable and Subscriptable

    Representation:

      Search(keyword=keyword_begin_searched , count=number_of_tweets_in_result)>
    

    Methods:

    • to_xlsx -> this returns Nothing and create an Excel Sheet
      • 'filename' parameter can be optionally pass to to_xlsx method in order to set the filename of Excel file , if not passed the default name of Excel file will be search-{keyword}.xlsx
    • to_csv -> this returns Nothing and create an CSV Sheet
      • 'filename' parameter can be optionally pass to to_csv method in order to set the filename of CSV file , if not passed the default name of Excel file will be search-{keyword}.csv

      to_csv and to_xlsx is not available when using filter

    You can check filters here Filters

    • to_dict -> this is return a tweet dict

    Attributes:

    • keyword -> Keyword which is being queried
    • dict -> Dictionary of Tweet Results

    All the Tweets included in the result are of class Tweet

    If used User Filter , All the Users included in the result are of class User

  • Trends

    Representation:

      Trends(name=name_of_trend)
    

    Methods:

    • to_dict -> this is return a tweet dict

    Attributes:

    • name -> Name of the trend
    • url -> URL of the trend
    • tweet_count -> Number of tweets of the trend
  • Stream

    Representation:

      Stream(content_type=content_type_of_media, length=length_of_media, bitrate=bitrate_of_media, res=resolution_of_media)
    

    Methods:

    • download -> Download the stream in the disk

      download method requires parameter filename

    Attributes:

    • bitrate -> Audio bitrate of stream
    • content_type -> Content Type of stream
    • url -> URL of stream
    • length -> Length of the stream in milliseconds
    • aspect_ratio -> Aspect Ratio of the stream
    • res -> Resolution of the stream

Search Filters

Description

You can filter your search results using these filters

Filter Types

  • Filter Latest Tweet

    Get the latest tweets for the keyword instead of Twitter Default Popular Tweets

    To use this filter you can pass latest directly to filter_ parameter of search OR pass SearchFilters.Latest() method from filters module

  • Filter Users

    Search only Users with corresponding keyword

    To use this filter you can pass users directly to filter_ parameter of search OR pass SearchFilters.Users() method from filters module

  • Filter Only Photos

    Search only Tweets has photo in it with corresponding keyword

    To use this filter you can pass photos directly to filter_ parameter of search OR pass SearchFilters.Photos() method from filters module

  • Filter Only Videos

    Search only Tweets has video in it with corresponding keyword

    To use this filter you can pass videos directly to filter_ parameter of search OR pass SearchFilters.Videos() method from filters module

Adding Proxy

Proxy Support is still experimental and isn't recommended

Why not recommended : Twitter often blocks the access to known proxies

Only http/https proxy is supported

In order to add the proxy , pass the proxy dict to the Twitter class Valid proxy dict format :

  {
    "http": "username:password@host:ip",
    "https": "username:password@host:ip"
  }
  {
    "http": "host:ip",
    "https": "host:ip"
  }

Example:

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweety.bot import Twitter
>>> proxy = {"http":"127.0.0.1:8080","https":"127.0.0.1:8080"}:
>>> user = Twitter("elonmusk",proxy=proxy).get_user_info()

Updates:

Update 0.1:

  • Get Multiple Pages of tweets using pages parameter in get_tweets() function
  • output of get_tweets has been reworked.

Update 0.2:

Update 0.2.1:

  • Fixed Hashtag Search

Update 0.2.2:

  • Fixed get_tweets() with multiple pages
  • Added Simplify Parameter in get_tweets() , to get simplified results instead of Twitter's cluttered results

Update 0.3:

  • Added getting multiple pages while searching keyword
  • searching a keyword now supports simplify parameter

Update 0.3.1:

  • Fixed the issue when searching more than 2 pages of keyword's tweet gives empty dict
  • Fixed the issue when using get_tweet with a username through an exception if the tweets of the user are less than the mentioned number of pages

Update 0.3.5:

  • Again reworked and simplified tweets in get_tweets and search function 😜
  • get_tweets and search now returns TweetDict object , more about TweetDict here
  • Tweets can now be imported as Excel Workbook

Update 0.3.9:

  • Tweets can now be imported s CSV too
  • The Project is Live at PYPI Repository

Update 0.4:

  • Module version on PYPI Repository is bumped to 0.1.2
  • Fixed the issue of 'No Guest Token Found'

Update 0.4.1:

  • Module version on PYPI Repository is bumped to 0.1.3
  • Fixed Tweet Formatting Issues

Update 0.5:

  • Module version on PYPI Repository is bumped to 0.2
  • Now every function by default returns its own type of object class , check here classes
  • Reworked and more simplified results of get_tweet and searches

Update 0.5.1:

  • Module version on PYPI Repository is bumped to 0.2.1
  • A simple but important update , fixed the issue of KeyError while looking for tweets

Update 0.5.2:

  • Module version on PYPI Repository is bumped to 0.2.11
  • Fixed the issue of multiple pages not being scraped for user tweets

Update 0.5.3:

  • Module version on PYPI Repository is bumped to 0.2.21
  • Results of get_tweet and searches are now iterable even without calling to_dict() method
  • tweet_details method returns Tweet object

Update 0.6:

Update 0.7:

  • Module version on PYPI Repository is bumped to 0.3.5
  • wait_time parameter added to wait between multiple requests on resultSet classes (Search, UserTweet)
  • Fixed a bug of getting replies where the module was getting the first page replies of Elon Musk only
  • Structural Improvements
  • ResultSet classes (Search, UserTweet) are now subscriptable

Update 0.7.1:

  • Module version on PYPI Repository is bumped to 0.3.6
  • Partial Support for proxies
  • Fixed the delay while getting tweets or searching keyword even if the pages is set to 1
  • Fixed KeyError when getting the user info

About

Twitter Scraper


Languages

Language:Python 100.0%