devinit / digital-platform

PostgreSQL/analyst → MongoDB → Development Data Hub

Home Page:http://data.devinit.org:8888/#!/ & http://data.devinit.org/#!/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New DDW DB uploaded to server + some background info

dw8547 opened this issue · comments

ddw_20160127084339 is now on 213.168.251.124 & modified the relevant user permissions to match those for ddw.

Please let me know when I can drop ddw and rename ddw_20160127084339.

The DH uses about 285 different data series/files (including those belonging to the category reference) each of which lives here: https://github.com/devinit/digital-platform/tree/master/country-year

The goal is to automate the generation of all of those data series. At the moment & in the past, the analysts put these data series/files together by hand. They are sourced from a number of external data providers, the major ones being OECD, WB & IMF.

We have currently automated 77 out of 285. We read in the remaining data series into tables from the flat files on: https://github.com/devinit/digital-platform/tree/master/country-year. The analysts are currently working on updating the data series/files that have not been automated. They are not ready yet, but when they I, I will repopulate the relevant tables. They haven't provided a date for when I can do this.

So, ideally, we would like the 'ddw' DB to only have two major schemas, 'dimension' & 'fact', from which the DH would read files. Because we haven't automated all of the series, the set up is a bit different. It is:

Name Owner
dac_country_deflator x
data x
data_series x
deflator x
dimension x
fact x
non_dac_country_deflator x
public postgres
reference x
series x

The DH will need to read the data from:

Name Owner
data_series x
dimension x
fact x
reference x

All the automated data series are in 'fact'. All the data series that have not been automated are in 'data_series'. All the automated reference files are in 'dimension'. All the reference files that have not been automated are in 'reference'.

We have kept the names of all the data series/files with the exception of changing the '-' to '_' in the names. Hence DH's 'oda.csv' is now ddw table oda.

A couple of things to look out for:

  • ddw_20160127084339 now contains 2014 data where applicable where it was previously missing. This affects the data series that are sourced from the OECD. They are:

    Schema Name Type Owner
    fact oda table x
    fact oda_donor table x
  • We get the individual ODA donor data series from oda_donor table, so DH's https://github.com/devinit/digital-platform/blob/master/country-year/oda-donor/oda-AE.csv is obtained by filtering oda_donor on donor_code = 'AE'.

  • Many of the data series/files contain information pertaining to money which can be given in either current or constant prices. As most of the data we get from the OECD, WB & IMF is given in current prices, it needs to be converted. As the latest direction from DI are to keep everything in 2012 constant prices, the relevant data series have been converted to 2012 constant prices. The are in:

    Schema Name Type Owner
    fact oda_2012 table x
    fact oda_donor_2012 table x
  • The same goes for the data series from the WB that needed to be price converted. They current & constant data series from the WB are:

    Schema Name Type Owner
    fact gdp_usd_current table x
    fact gdp_usd_current_2012 table x
    fact gni_pc_usd_current table x
    fact gni_pc_usd_current_2012 table x
    fact gni_usd_current table x
    fact gni_usd_current_2012 table x

@xriss & @notshi, I've just noticed that the price conversion function leaves out some of the columns for 'oda'. I will fix it & push to 213.168.251.124 again when ready. My apologies! Once I've done that I'll get on with checking the files.

@xriss & @notshi ddw_20160127143241 is now on 213.168.251.124 with the corrected price conversion function & oda_2012 table.

commented

Thanks, @dw8547. Duly noted.

commented

You can rename and drop whenever you want, the only bit that will cause any problems is that I need a readonly user.

currently I've setup a user called, "readonly" with appropriate access to ddw.

However if you add a new database or schema or table then you will need to make sure that the following have been granted for each new one.

GRANT CONNECT ON DATABASE dbname TO readonly;

GRANT USAGE ON SCHEMA schema TO readonly;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA schema TO readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA schema TO readonly;

@xriss readonly now has the above permissions on ddw_20160127143241.

@notshi I can't drop ddw (and rename ddw_20160127143241)

ERROR: database "ddw" is being accessed by other users
DETAIL: There is 1 other session using the database.

postgres=# SELECT usesysid, usename, datname FROM pg_stat_activity WHERE usename != 'x';

usesysid usename datname
16384 shi ddw
commented

Thanks, @dw8547. I've logged out now so should be ok to rename.

@notshi, dropped and renamed.

Dead issue.