wikipediastats.js

The bot powered by this code, @wikipediastats, is now inactive due to frequent HTTPS errors/timeouts while fetching the various Wikipedias' stats pages – clearly, scraping them (even if it's only every four hours, with plenty of time between requests) isn't an intended use case.

A Node.js-powered Twitter bot that posts milestones and statistics of various Wikipedias.

While the main purpose of building this Twitter bot was to get myself acquainted with Node.js, it's actually doing semi-interesting stuff. Whenever you run this program, it

downloads and parses a list of all the different-language Wikipedias,
scrapes some of the more interesting statistics for each of them,
compares these stats to previously scraped and cached values (unless the cache doesn't exist, in which case goto 5),
posts a tweet if a milestone has been reached, i.e. the first digit of a stat has changed (e.g. 49894 → 50002), and
refreshes the now-stale cache with the newly scraped values.

Now witness the firepowerresults of this fully armed and operational battle stationTwitter bot and check out @wikipediastats!

Note that I've previously implemented this bot in Haskell. As it turned out, the otherwise-excellent shared hosting plan I'm running all of my Twitter bots on limits RAM use to 1.5 GB per user, which was insufficient for building some of the Haskell variant's dependencies – hence this reimplementation (which also comes with improved logging versus the original).

Setup

Fairly typical for a modern server-side JavaScript thing, I believe. First, install a reasonably recent release of Node.js – it's almost certainly available through your package manager. Then:

$ git clone https://github.com/doersino/wikipediastats.js
$ cd wikipediastats.js
$ npm install

If that's been successful, navigate to config/default.yaml and configure your instance of this bot. Simply follow the instructions in the comments! This will involve entering your Twitter API credentials, ideally in a separate YAML file located at config/production.yaml – which will make sure the credentials never make their way into source control.

Run the bot for the first time:

$ NODE_CONFIG_ENV=production node .

This will populate the stats cache, based on which newly reached milestones are determined on each successive run.

If you're actually intending to use this as a Twitter bot, set up a cronjob to execute node . every hour or so, roughly like this:

0 * * * * cd PATH_TO_WIKIPEDIASTATSJS && NODE_CONFIG_ENV=production node .

(NODE_CONFIG_ENV=production is only needed if the Twitter API credentials are kept in config/production.yaml.)

Development

The source code should conform to JavaScript Standard Style. To enforce this, standard is listed as a development dependency in package.json – you might need to run npm install --production=false to set it up. Then, make sure to run standard --fix before any commit.

You can configure your text editor to do this automatically, running a formatting pass either on save or via keystroke. For Sublime Text (which I use), install the StandardFormat package. Plugins for other editors are listed here.

Debugging using Chrome's built-in developer tools is super useful and comes for free with recent versions of Node: Simply run node --inspect-brk ., navigate to chrome://inspect/ and click "Open dedicated DevTools for Node".

Notes

This two-afternoon project was my first foray into Node.js, so don't expect elegance or adherence to best practices. Don't expect JavaScript from 2008, either – I've sprinkled on a healthy amount of async, await and .then().
An improvement I didn't care to implement: Store the largest tweeted value (for each stat, for each Wikipedia) in the cache in order to avoid duplicate tweets when the stat reaches a milestone, falls below it again due to article deletions or similar, then reaches the milestone again.

doersino / wikipediastats.js

wikipediastats.js

Now witness the firepowerresults of this fully armed and operational battle stationTwitter bot and check out @wikipediastats!

Setup

Development

Notes

About

Languages