worldbank / povcalnet

Stata client to the Povcalnet API

Home Page:https://worldbank.github.io/povcalnet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suggestion to create unique country-year sample marker in povcalnet bulk download

akraay opened this issue · comments

(this repeats email sent to Andres and team June 21, copied to Github as per Andres' request)

Andres and colleagues,

Can I make one other suggestion – not so much for the API wrapper but for the data itself?

You will have lots of (somewhat superficial) users like me who just want one entry for each country-year. So for example for China/India/Indonesia some people will want to have just the national aggregate and not the rural/urban entries separately, and for a bunch of LAC and ECA countries you have many country-years where you have both and income- and consumption-based measures.

Having just spent a slightly tedious half-hour deduplicating these records manually (my decision rules below), I wonder if you might want to introduce one or more dummy variables in the dataset that define samples uniquely identified by country/year? What I did below for example is pick either the consumption or the income measure to ensure the longest possible time series (either all consumption, or all income), and I dropped the separate rural/urban entries.

If you had a dummy like this, then superficial users like me who want just one data point per country/year can simply filter on this dummy and then have a dataset that is amenable to merging based on country-year.

Aart

// Load data, define integer year variable by rounding fractional data_year
povcalnet
gen year=round(data_year)
gen wbcode=country_code

// Deduplicate lots of entries where there are multiple data points for same country/year
// Note data_type=1 for "Consumption", 2 for "Income"
// Note coverage_type=1 for Rural, 2 for Urban
drop if wbcode=="BGR" & year==2007 & data_type==1
drop if wbcode=="CHN" & coverage_type==1 | coverage_type==2
drop if wbcode=="EST" & year>=2003 & data_type==1
drop if wbcode=="HRV" & year>=2009 & data_type==1
drop if wbcode=="HTI" & year==2012 & data_type==1
drop if wbcode=="HUN" & year>=2004 & data_type==1
drop if wbcode=="IDN" & coverage_type==1 | coverage_type==2
drop if wbcode=="IND" & coverage_type==1
drop if wbcode=="LTU" & year>=2004 & data_type==1
drop if wbcode=="LVA" & year>=2004 & data_type==1
drop if wbcode=="MEX" & data_type==1
drop if wbcode=="NIC" & data_type==1
drop if wbcode=="PHL" & data_type==2
drop if wbcode=="POL" & year>=2004 & data_type==2
drop if wbcode=="ROU" & year>=2006 & data_type==2
drop if wbcode=="SRB" & data_type==2
drop if wbcode=="SVK" & year>=2004 & data_type==1

Thank you so much, Aart. We will keep you posted

Dear Aart,

Thank you so much for your comment.

In the team, we all agreed that your comment makes a lot of sense and that we should provide the users with uniquely identified samples by country/year. We discussed the feasibility of including dummy variables in the povcalnet output as per your suggestion, but for sake of consistency with how we handle (and intend to handle) other modifications, we’ve implemented this request in a slightly different manner. From the user’s standpoint, we’re hoping it will be just as convenient.

In the help file of the new release of the povcalnet command (v1.0.0-beta.3), we have included two routines that deliver on your proposed enhancement. The first routine creates a sample uniquely identified by country/ year with the longest possible time series for each country, even if welfare type changes from one year to another. The second routine creates a sample uniquely identified by country/year with the longest possible time series for each country, either all consumption, or all income. (examples 3.1 and 3.2, respectively). Each routine is clickable so that the user won’t even need to copy and paste.

We hope this solution meets your needs.
We would highly appreciate your comments back.

Best,

Thanks Andres and colleagues. What would be the stata syntax to retrieve option 3.1 vs 3.2 so I can try it out? Also any special instructions to ssc install the current version with this feature? Aart