answerquest / tgscrape

Quick and dirty public Telegram group message scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tgscrape

Quick and dirty public Telegram group message scraper

Usage

To dump messages from a public group

$ python3 tgscrape.py <groupname> [minid] [maxid]

Examples

To dump all messages in the group fun_with_friends type:

$ python3 tgscrape.py fun_with_friends

You can specify the message id you want to start and stop. For instance, to dump messages with id's 1000 through 2000 type:

$ python3 tgscrape.py fun_with_friends 1000 2000

If you want to start at message id 1000 and dump all messages after it, just skip the last parameter:

$ python3 tgscrape.py fun_with_friends 1000

Retrieved messages are stored in json format in the conversations folder.

To read and search dumped messages

$ python3 tgscape_cli.py <groupname>

The following is the list and description of available commands:

Commands:
    search <terms>              search words or strings (in quotes) in messages and names
    all                         returns all dumped messages
    last <num>                  returns last <num> messages (default: 10)
    date <date>                 returns all messages for a date (format: YYYY-MM-DD)
    wordcloud                   returns the top 20 words (wordlen > 3)
    exit                        exits the program
    help                        this

Examples

If you want to search all messages and names containing either "foo" and "bar" type:

> search foo bar

If you want to search all messages and names containing the string "foo bar" type:

> search "foo bar"

To read all messages written on January 3rd, 2018, type:

> date 2018-03-01

Requirements

BeautifulSoup4

To install dependencies:

$ pip install -r requirements.txt

About

Quick and dirty public Telegram group message scraper

License:MIT License


Languages

Language:Python 100.0%