a-fent / microsim-ipf

Spatial Microsimulation by IPF in R: functions, examples and papers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spatial Microsimulation by Iterative Proportional Fitting in R: Household Income in London

Introduction

This repository presents an implementation and demonstration of spatial microsimulation (SMS) by iterative proportional fitting (IPF) for R-statistics. The implementation includes multi-level and parallel processing of the fitting procedure.

The use of the code is demonstrated by a spatial microsimulation of household income distributions in London’s 33 boroughs in the years 2001 and 2011. The package correspondingly includes not only the R functions, but also detailed examples of the fitting procedure, and source code and supporting data for two research papers presenting analyses of the estimated income distributions. These are published as working papers at the Centre for Analysis for Social Exclusion (CASE) at the London School of Economics:

Intended uses

Doing microsimulation in R

R-users who wish to carry out their own simulations using IPF are seeking some starting code with which to this. At the time of writing, no complete implementation of this was found. Particular features of the implementation provided include:

  • “Multi-level” IPF, using constraints and survey data from two observation levels (for example, people and households) to produce weights
  • Versions of the fitting functions that run in parallel, which significantly speeds the fitting process on modern multi-core computers
  • A set of functions for creating tables of estimated statistics (e.g. means, quartiles, totals) using weights for many different areas

Supporting material for the analytic papers

The papers document how the analyses presented in the papers were performed, how the graphics were produced, and the definitions of variables in the simulation.

Doing analysis of local income distributions of London

The output weights from the simulations can be used to carry out analyses of borough-level income distributions in 2001 and 2011. For convenience, the fitted weights for London boroughs are included in the repository, under /ipf/weights/. The functions in /r/ipf_functions.r can be used to set up spatial simulations for each borough and to estimate statistics from them.

Users who wish to use the provided weights to do their own analysis of income distributions in London boroughs will require at least a copy of the Households Below Average Income case-level survey data. Some analyses, and any adaptations to the simulation itself will require a copy of the Family Resources Survey case-level survey data. Both of these are available from the UK Data Archive.

Package Contents

R

Contains the single file ipf_functions.r, which provides a set of R functions for carrying out and using iterative proportional fitting (IPF) techniques in a spatial microsimulation.

  • Firstly, it provides a set of functions for creating new weights for lists of areas from survey case data and local population totals. Variants include carrying out the reweighting in parallel across multiple cores, and for doing “multi-level” reweighting, where constraints and survey cases at two different levels of observation (e.g. people and households) are combined to create new weights.
  • Secondly, it provides a fairly simple set of convenience functions for using the simulation weights produced to estimate a variety of statistics across a set of areas. These are heavily based on the survey package.

IPF

The R and data files here present the use IPF weighting to carry out a spatial microsimulation of household income in London in 2001 and 2011, using the Family Resources Survey and Census and administrative data.

  • The make_ipf_weights_london_la files are the main files for carrying out the fitting and generating survey weights for HBAI for the respective years. Different techniques are demonstrated including multi-level fitting, as described in the paper.
  • The constraints_ and the frs_load_recode files contain the mappings from FRS survey case variables to the Census and administrative data to which they are to be fitted.
  • The constraints/ directory holds the Census and administrative data for London boroughs used in the fitting.
  • The weights/ directory holds the generated weights, which may be used to carry out estimation of HBAI variables for London boroughs.

Paper

Two papers, one technical and one substantive discuss and exemplify the use of microsimulation to analyse the changing spatial distribution and composition of poverty in London boroughs from 2001 to 2011.

Both papers are written as org-mode files for Emacs. They are plain text and should be readable in any editor. They contain the executable R code to produce the various tables and graphics which the published papers contain.

Addendum - further information

Various bits of information relevant to these methods in R and to small-area income estimation in the UK that has appeared since this research was done.

Other implementations of IPF in R

When I started on this research in 2013, I didn’t find any existing IPF implementations. There are now (2016) some dedicated packages within CRAN that offer (I haven’t checked) implementations of IPF, which I found through Robin Lovelace’s draft book on spatial microsimulation, which discusses the use of these packages for spatial microsimulation.

IPFP
A fast implementation of IPF in C
MIPFP
Multidimensional Iterative Proportional Fitting and Alternative Models

Since it’s under a different name, I also didn’t find the rake function within the survey package, which can do IPF.

Small-area income estimates for England from the ONS

Around the time that the papers were published (July 2016) the Office for National Statistics announced that it would be continuing and extending its programme of small-area income estimates. A description of the method that ONS plans to use can be found in a paper from an ONS methodological meeting.

Licence

Public domain, for the time being.

About

Spatial Microsimulation by IPF in R: functions, examples and papers


Languages

Language:R 92.3%Language:TeX 7.7%