pegler / revere

general purpose monitoring and alerting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Revere Build Status Coverage Status

Disclaimer

Revere runs Python entered via a webpage. It currently makes no attempt to sandbox this code. Always run Revere as a non-privledged user and ensure you have authentication set up.

This project was inspired by LivingSocial's Rearview. We have been using it without issue for serveral months, but it is far from stable.

There is optional Google Apps OAuth authentication built in, but due to the nature of this application, it is still hightly recommended that you secure Revere behind your firewall.

Terms

  • source - a source of data. A database, graphite server, or 3rd party monitoring API
  • alert - a way of alerting you when certain criteria are met. Campfire, AWS SNS, email, text, etc.
  • monitor - a script that runs on a schedule and pulls data from one or more sources. A monitor can indicate it is in an Alarm state, in which case the alerts will be fired.

Features

  • Pluggable sources of data
  • Pluggable alerts to notify you
  • Write your moniors using Python and specify the schedule using crontab syntax
  • Store the return value of the monitor (numbers or strings) for each run
  • Automatic purging of old data (day granularity)

Revere is a general purpose monitoring and alerting system. It has pluggable sources of data and alerts. So you can pull data from anywhere you want and then trigger alarms when certain thresholds are crossed, all while using pure python for your calculations.

Installation

pip install git+git://github.com/pegler/revere.git

//create a config file.  defaults will be used if missing from the file
touch config.py

//create the SQLite database
revereserver.py init

//run Revere. defaults to port 5000
revereserver.py run

Configuration

Revere uses a python file named config.py in the current working directory. The configuration variables are:

  • DATABASE_PATH - the path to the SQLite file
  • REVERE_SOURCES - a dict specifying the sources. The key can be anything and is used by the monitors to access the source. The value is a configuration dict for the source.
  • REVERE_ALERTS - a dict specifying the alerts
  • GOOGLE_APPS_DOMAIN - if specified, Google Apps OAuth Authentication will be enabled and enforced on all views. The domain specified will be the only domain permitted access. This requires specifying a SECRET_KEY in your config file as well.

Example config.py file:

DATABASE_PATH = 'revere.db'

SECRET_KEY = 'something random and secret'

#GOOGLE_APPS_DOMAIN = 'example.com' # optional

REVERE_SOURCES = {
    'graphite': {
        'description': 'Graphite Server',
        'type': 'revere.sources.graphite.GraphiteSource',
        'config': {
            'url': 'http://dashing.example.com/render',
            'auth_username': 'username',
            'auth_password': 'password',
        }
    },
    'mysql': {
        'description': 'Local MySQL Database',
        'type': 'revere.sources.database.DatabaseSource',
        'config': {
            'connection_string': 'mysql://readonlyuser:password@localhost/production',
        }
    }
}

REVERE_ALERTS = {
    'campfire-engineering': {
        'description': 'Post a message to Campfire - Engineering',
        'type': 'revere.alerts.campfire.CampfireAlert',
        'config': {
            'api_token': 'xxxxxx',
            'subdomain': 'example',
            'room_id': '123456',
        }
    },
    'operations-sns': {
        'description': 'Publish a message to AWS SNS Topic operations',
        'type': 'revere.alerts.sns.SNSAlert',
        'config': {
            'region': 'us-east-1',
            'topic_arn': 'xxxxx',
            'access_key_id': 'xxxxx',
            'secret_key': 'xxxxx',
        }
    }
}

Monitors

Monitors are configured using simple Python. Simply navigate to the "Create Monitor" page, specify the schedule using crontab syntax, specify the retention period, and then write the Python that does the checking. The script is executed with a dictionary named sources in scope that has the various sources configured available. The keys are the same as specified in the configuration file.

If the monitor has "failed" and should be in the ALARM state, the code should raise a MonitorFailure exception. The message passed into the exception will be included in any alerts triggered from the ALARM state.

Any other exception raised will be change the monitor to the ERROR state and trigger any enabled alerts.

Any data assigned to the variable return_value will be recorded. The data must be an int, float, long, string, or unicode.

An example monitor:

total_requests = sources['dashing'].get_sum('sum(stats_counts.response.*)','-10min')
error_requests = sources['dashing'].get_sum('stats_counts.response.500','-10min')
error_percentage = error_requests/total_requests
return_value = error_percentage

if error_percentage >= .005:
    raise MonitorFailure('High number of error responses. %s%%' % (error_percentage))

Alerts will often include the return value, message passed into MonitorFailure, and the current state of the monitor.

Sources

revere.sources.graphite.GraphiteSource

Pull data from a Graphite server.

Configuration

Parameters:

  • url - the url of the graphite server
  • auth_username (optional) - username for basic authentication
  • auth_password (optional) - password for basic authentication

Usage

It has 3 methods, all with identical parameters.

  • path - the dotted path for the data to retreive. Graphite functions can be passed in.
  • from_date - any valid graphite starting time. example: '-5d'
  • to_date - any valid graphite starting time. example: '-2d'

Methods:

  • get_datapoints(path, from_date=None, to_date=None) - return a list of (value, timestamp) pairs for the path within the given timeframe
  • get_sum(path, from_date=None, to_date=None) - return the sum of the values. null values are counted as 0
  • get_avg(path, from_date=None, to_date=None) - return the average of the values. null values are counted as 0

revere.sources.database.DatabaseSource

Connect to any database. It uses SQLAlchemy for connections, which supports most databases.

Configuration

Parameters:

Usage

The only method is execute(sql, as_dict=False) which accepts raw SQL and returns either a list of tuples. If as_dict is True, it will return a list of dicts keyed on the column names.

Alerts

Alerts can be configured to only fire when a monitor transitions to a particular state. So you can get a phone call when a monitor is in the ALARM state, but only get an email when it goes back to the OK state.

revere.alerts.campfire.CampfireAlert

Send a message to a Campfire room of the form:

[Revere Alarm]
Monitor: Mail Queue Length
State Change: ALARM -> OK
Message: Monitor Passed
Return Value: 67

Configuration

  • api_token - the API token for the user to send the message as
  • room_id - the id for the room to post to. Find this in the URL of the room
  • subdomain - the subdomain for the room to post to. Find this in the URL of the room

revere.alerts.sns.SNSAlert

Send a message to an Amazon Web Services' Simple Notification Service (AWS SNS) Topic. It will include a subject and body for emails as well as a shortened message to be sent to SMS subscribers.

Configuration

  • topic_arn - the topic ARN to post to. Of the form: arn:aws:sns:us-east-1:1234567890:topic-name
  • access_key_id - the API Access Key ID to post to the topic
  • secret_key - the API Secret Key to post to the topic

Screenshots

A list of monitors with their current state and time since last run image


The overview page for a monitor. It lists the past state changes including the return value from the monitor and alarm message. image


Full history for a monitor image


The list of alerts and which states they get triggered for. image

Thanks

This project is mostly just cobbling together several other excellent projects.

About

general purpose monitoring and alerting

License:GNU General Public License v2.0


Languages

Language:Python 69.2%Language:HTML 26.8%Language:Shell 2.9%Language:CSS 0.5%Language:Makefile 0.3%Language:JavaScript 0.3%