web-monitoring-db

Still just a proof-of-concept at the moment!

This is essentially a more automated version of the page monitoring workflow currently managed through a combination of versionista-outputter, Google spreadsheets, and lots of manual work. This repository is part of the EDGI Web Monitoring Project.

It’s a Rails app that:

Acts as a database of tracked pages and revisions that have been made to them
Can automatically update itself from Versionista
[Not yet done] Provides an API to get revision data and allow analysts to update metadata about the revision

Installation

Ensure you have Ruby 2.4.0+
You don’t have the bundler Ruby gem, install it:

$ gem install bundler

Clone this repo
Wherever you cloned the repo, go to that directory and:
```
$ bundle install
$ rails server
```
You should now have a server running and can visit it at http://localhost:3000/

To actually pull down new revisions in the last 24 hours from Versionista:

$ VERSIONISTA_EMAIL=login-email-here VERSIONISTA_PASSWORD=login-password-here rake update_from_versionista[24]

Just be sure to replace login-email-here and login-password-here with the appropriate values :)

The [24] at the end is how many hours before now to start. You can add a second argument to tell it not to include revisions newer than so many hours ago. For example, to only retrieve revisions created between 24 and 12 hours ago:

$ rake update_from_versionista[24,12]

If you just want to scrape the info from Versionista and store it in a JSON file for later use without updating the DB:

$ VERSIONISTA_EMAIL=login-email-here VERSIONISTA_PASSWORD=login-password-here rake scrape_from_versionista[24]

This will create a JSON file named scraped_data-[from hours]-[until hours].json (e.g. scraped_data-24-0.json in the example above) and put it in the tmp directory. You can specify a different file path to use as a third argument in square brackets.

And to update the DB from that scraped data:

$ rake update_from_json['./tmp/scraped_data-24-0.json']

The argument in square brackets should be the path to your JSON file.

License & Copyright

It is licensed under the GPL v3 source code license, found in the LICENSE file.

About

Website Monitoring project: a more automated version of page monitoring with Versionista (proof of concept for now)

GNU General Public License v3.0

Languages

Language:Ruby 82.9%Language:HTML 13.9%Language:CSS 1.8%Language:JavaScript 1.4%