Enhydris aggregator

The Enhydris aggregator helps you create an Enhydris instance (the "target" database) that only contains copies of data that exist in other Enhydris instances (the "source" databases).

For example, the Hydroscope project has four databases (four Enhydris instances) that belong to four organizations: kyy.hydroscope.gr, emy.hydroscope.gr, deh.hydroscope.gr, and ypaat.hydroscope.gr. There is also a database, main.hydroscope.gr, that serves as an alternative entry point to all these; it contains the stations of all four, but if you actually want the time series data, it redirects you to the actual database. This database, main.hydroscope.gr, is powered by Enhydris and the Enhydris aggregator.

The aggregator does two things:

It has a management command, ./manage.py aggregate, which should be run from a cron job, say once per day. It currently works in a relatively naïve way: it deletes all data from the target database, and copies it from scratch from the source databases through their web APIs.
It provides a modified template for the timeseries detail page, which, instead of a link to download the data, contains a link that points to the equivalent page of the originating database.

Installation and configuration

Clone the enhydris-aggregator repository and make sure that when Enhydris is executed it is in the Python path.
Activate the virtualenv for Enhydris and install the aggregator's requirements with pip install -r requirements.txt.
Make the following adjustments to the settings of Enhydris:
```
# Install enhydris_aggregator immediately before enhydris.hcore, so
# that it can override its templates.
INSTALLED_APPS.insert(INSTALLED_APPS.index('enhydris.hcore'),
                      'enhydris_aggregator')

ENHYDRIS_AGGREGATOR = {
    'SOURCE_DATABASES': [
        {
            'URL': 'http://some.enhydris.instance.com/',
            'ID_OFFSET': 1000000,
        },
        {
            'URL': 'http://some.other.enhydris.instance.com/',
            'ID_OFFSET': 2000000,
        },
    ]
}
```
Because an object with a certain id in one source database may be different from the object with the same id in another source database, these ids must be changed before entering these objects to the target database. The Enhydris aggregator achieves this by adding the ID_OFFSET to the source id. It does this for all objects, including lookups, and it changes the foreign keys as well. You must select the ID_OFFSET for all source databases so that there is no id conflict (e.g. in the above example make sure the first source database does not use an id larger 999999).
Execute ./manage.py aggregate and also have cron execute it. You can also try ./manage.py aggregate --help to see possible options.

openmeteo / enhydris-aggregator

Enhydris aggregator

Installation and configuration

Meta

About

Languages