Discord Site Monitor and Parser

Scrapes webpages in (almost) any web format, then sends discord notifications based on extracted data and customizable logic.

Credit

Some code borrowed from Noel Vissers' "Site-watcher" project.

Features

Support a variety of formats including rss, pdf, html, json and more.
Extract specific values from the site, and trigger alerts based on those values
Add multiple sites to watcher
Checks on a specified interval (cronjob, currently configured in the source code).
Update tracked sites via modification of the underlying json or via bot commands
Open source!

Setup

Create a discord bot discord.com/developers/applications. A tutorial can be found here.
Install npm packages, compile the typescript project

Configuring the bot:

Open the .env file.
Add DISCORDJS_BOT_TOKEN= followed by your discord bot's token. You can get the token from discord.com/developers/applications.
If you want to change the prefix (default "s!"), you can change it in ./src/types.ts (export const PREFIX = 's!';).

Usage

Invite the bot to your Discord server by replacing 123456789012345678 in the following link with your bot's client id: https://discord.com/oauth2/authorize?client_id=123456789012345678&scope=bot&permissions=8.
Create a site config file called sites.json at the path src/json/sites.json. Follow the example shown in sample-sites.json to populate the file, OR... *** Since the original version of this bot there have been major refactors to change what constitutes a valid site file. JSON arrays now have to be saved as strings so that redis can properly store and retrieve them. The reason redis arrays are not used is to preserve numeric indices wherever those are set.
Run the bot (with node) then add a website with the !add <URL> command. //(Still needs to be implemented)

For all other options, see Commands.

Commands

`!help`

Show all the available commands.

Parameters
None.

`!add <URL>`

Adds a website to the list.

Parameters
Required:
URL The URL of the site you want to track.

Example
!add https://google.com/ This tracks changes on https://google.com/.
_{Note that some sites, including Google.com have dynamic elements (like ads) that cause a change every time its checked. To make sure these dynamic elements are filtered out, use the css selector parameter.}

!add https://example.com/ "body > div > h1" This tracks changes in header 1 of the site https://example.com/.

Output

`!remove <INDEX>`

Removes a website from the list.

Parameters
Required:
INDEX The index of the site you want to remove. Use !list to see the number of the site(s). NOTE - the list indexs are 1 indexed but you must pass a zero indexed value to remove a site

Example
!remove 0 This removes the first site in the list (!list).

Output

`!list`

Sends the list of websites being watched.

Parameters
None.

Example
!list This sends the list of websites being watched.

Output

`!listv`

Sends a verbose message with details for each of websites being watched.

Parameters
None.

Example
!listv This sends a verbose list of websites being watched. Verbose includes the full json configuration for each site.

Output

`!update`

Manually updates the sites that are being watched.

Parameters
None.

Example
!update This manually updates the sites that are being watched.
_{If a site is updated, it will push the standard update message to the default update channel.}

Output

`!interval <MINUTES>`

Set the interval/refresh rate of the watcher. Default 5 minutes.

Parameters
MINUTES The interval in minutes (minimum of 1, maximum of 60).

Example
!interval 10 Sets the interval to 10 minutes.

Output

`!start`

Start the watcher with the specified interval (default ON with interval of 5 minutes).
_{This uses cron.}

Parameters
None.

Example
!start This starts the watcher with the specified interval.

Output

`!stop`

Stops the watcher from automatically checking the tracked websites. Watcher can be resumed with !start.

Parameters
None.

Example
!stop This stops the watcher from automatically checking the tracked websites.

Contribute

Not actively maintaining this, but if you think of some interesting use cases let me know and we can see about collaborating.

License

This project is licensed under the MIT License - see the LICENSE file for details

Creating a config/sites file

A config/site object looks something like this:

{

"id": "jobless",

"url": "https://www.dol.gov/ui/data.pdf",

"contentSelector": "body",

"lastChecked": "7/22/2021, 3:09:44 AM",

"lastUpdated": "7/22/2021, 3:09:44 AM",

"regex": "(?<= initial claims was ).*(?=, a ..crease)",

"hash": "412adf44f97b7ac387ae276edbd1b8c3",

"match": "360,000",

"sendAnyChange": "true",

"index": null,

"format": "pdf"

}

The arguments to care about are:

id: identifier for the site that is being tracked url: the address of the data source contentSelector: Used for css queries, retrieving nested json or whatever is appropriate for the case that you've defined regex: used to clean up the string retrieved by earlier logic sendAnyChange: If true will send the result of any update (yes or no), if false will only send an alert if the designated condition is met. You would have to define a null return for the failure condition (or false/undefined) to avoid triggering the send index: if using css all will select the nth element as specified format: Options are (currently): json, pdf, rss, css and cssall. They are what they say, cssall simply distinguishes if the queryselector will have multiple returns and if index is therefore necessary(or not)

EllAchE / discord_site_monitor

Discord Site Monitor and Parser

Credit

Features

Setup

Usage

Commands

`!help`

`!add <URL>`

`!remove <INDEX>`

`!list`

`!listv`

`!update`

`!interval <MINUTES>`

`!start`

`!stop`

Contribute

License

Creating a config/sites file

WIP, many features have been added/adapted since I last updated this page

About

Languages