Revere runs Python entered via a webpage. It currently makes no attempt to sandbox this code. Always run Revere as a non-privledged user and ensure you have authentication set up.
This project was inspired by LivingSocial's Rearview. We have been using it without issue for serveral months, but it is far from stable.
There is optional Google Apps OAuth authentication built in, but due to the nature of this application, it is still hightly recommended that you secure Revere behind your firewall.
- source - a source of data. A database, graphite server, or 3rd party monitoring API
- alert - a way of alerting you when certain criteria are met. Campfire, AWS SNS, email, text, etc.
- monitor - a script that runs on a schedule and pulls data from one or more sources. A monitor can indicate it is in an Alarm state, in which case the alerts will be fired.
- Pluggable sources of data
- Pluggable alerts to notify you
- Write your moniors using Python and specify the schedule using crontab syntax
- Store the return value of the monitor (numbers or strings) for each run
- Automatic purging of old data (day granularity)
Revere is a general purpose monitoring and alerting system. It has pluggable sources of data and alerts. So you can pull data from anywhere you want and then trigger alarms when certain thresholds are crossed, all while using pure python for your calculations.
pip install git+git://github.com/pegler/revere.git
//create a config file. defaults will be used if missing from the file
touch config.py
//create the SQLite database
revereserver.py init
//run Revere. defaults to port 5000
revereserver.py run
Revere uses a python file named config.py in the current working directory. The configuration variables are:
DATABASE_PATH
- the path to the SQLite fileREVERE_SOURCES
- a dict specifying the sources. The key can be anything and is used by the monitors to access the source. The value is a configuration dict for the source.REVERE_ALERTS
- a dict specifying the alertsGOOGLE_APPS_DOMAIN
- if specified, Google Apps OAuth Authentication will be enabled and enforced on all views. The domain specified will be the only domain permitted access. This requires specifying aSECRET_KEY
in your config file as well.
Example config.py file:
DATABASE_PATH = 'revere.db'
SECRET_KEY = 'something random and secret'
#GOOGLE_APPS_DOMAIN = 'example.com' # optional
REVERE_SOURCES = {
'graphite': {
'description': 'Graphite Server',
'type': 'revere.sources.graphite.GraphiteSource',
'config': {
'url': 'http://dashing.example.com/render',
'auth_username': 'username',
'auth_password': 'password',
}
},
'mysql': {
'description': 'Local MySQL Database',
'type': 'revere.sources.database.DatabaseSource',
'config': {
'connection_string': 'mysql://readonlyuser:password@localhost/production',
}
}
}
REVERE_ALERTS = {
'campfire-engineering': {
'description': 'Post a message to Campfire - Engineering',
'type': 'revere.alerts.campfire.CampfireAlert',
'config': {
'api_token': 'xxxxxx',
'subdomain': 'example',
'room_id': '123456',
}
},
'operations-sns': {
'description': 'Publish a message to AWS SNS Topic operations',
'type': 'revere.alerts.sns.SNSAlert',
'config': {
'region': 'us-east-1',
'topic_arn': 'xxxxx',
'access_key_id': 'xxxxx',
'secret_key': 'xxxxx',
}
}
}
Monitors are configured using simple Python. Simply navigate to the "Create Monitor" page, specify the schedule using crontab syntax, specify the retention period, and then write the Python that does the checking. The script is executed with a dictionary named sources
in scope that has the various sources configured available. The keys are the same as specified in the configuration file.
If the monitor has "failed" and should be in the ALARM state, the code should raise a MonitorFailure
exception. The message passed into the exception will be included in any alerts triggered from the ALARM state.
Any other exception raised will be change the monitor to the ERROR state and trigger any enabled alerts.
Any data assigned to the variable return_value
will be recorded. The data must be an int, float, long, string, or unicode.
An example monitor:
total_requests = sources['dashing'].get_sum('sum(stats_counts.response.*)','-10min')
error_requests = sources['dashing'].get_sum('stats_counts.response.500','-10min')
error_percentage = error_requests/total_requests
return_value = error_percentage
if error_percentage >= .005:
raise MonitorFailure('High number of error responses. %s%%' % (error_percentage))
Alerts will often include the return value, message passed into MonitorFailure
, and the current state of the monitor.
Pull data from a Graphite server.
Parameters:
- url - the url of the graphite server
- auth_username (optional) - username for basic authentication
- auth_password (optional) - password for basic authentication
It has 3 methods, all with identical parameters.
path
- the dotted path for the data to retreive. Graphite functions can be passed in.from_date
- any valid graphite starting time. example: '-5d'to_date
- any valid graphite starting time. example: '-2d'
Methods:
get_datapoints(path, from_date=None, to_date=None)
- return a list of(value, timestamp)
pairs for the path within the given timeframeget_sum(path, from_date=None, to_date=None)
- return the sum of the values. null values are counted as 0get_avg(path, from_date=None, to_date=None)
- return the average of the values. null values are counted as 0
Connect to any database. It uses SQLAlchemy for connections, which supports most databases.
Parameters:
- connection_string - the SQLAlchemy connection string to the database. See: http://pythonhosted.org/Flask-SQLAlchemy/config.html#connection-uri-format
- pool_recycle (default: 3600) - number of seconds before a connection in the pool should be recycled
The only method is execute(sql, as_dict=False)
which accepts raw SQL and returns either a list of tuples. If as_dict
is True
, it will return a list of dicts keyed on the column names.
Alerts can be configured to only fire when a monitor transitions to a particular state. So you can get a phone call when a monitor is in the ALARM state, but only get an email when it goes back to the OK state.
Send a message to a Campfire room of the form:
[Revere Alarm]
Monitor: Mail Queue Length
State Change: ALARM -> OK
Message: Monitor Passed
Return Value: 67
- api_token - the API token for the user to send the message as
- room_id - the id for the room to post to. Find this in the URL of the room
- subdomain - the subdomain for the room to post to. Find this in the URL of the room
Send a message to an Amazon Web Services' Simple Notification Service (AWS SNS) Topic. It will include a subject and body for emails as well as a shortened message to be sent to SMS subscribers.
- topic_arn - the topic ARN to post to. Of the form:
arn:aws:sns:us-east-1:1234567890:topic-name
- access_key_id - the API Access Key ID to post to the topic
- secret_key - the API Secret Key to post to the topic
A list of monitors with their current state and time since last run
The overview page for a monitor. It lists the past state changes including the return value from the monitor and alarm message.
The list of alerts and which states they get triggered for.
This project is mostly just cobbling together several other excellent projects.
- Flask - the web front-end
- SQLAlchemy - excellent lightweight database wrapper
- APScheduler - managing the schedule for the monitors
- Tornado - lightweight web server
- Google Federated Logins for Flask - Google Apps Authentication