Segregation Measures Framework in PySAL

Analytics for spatial and aspatial segregation in Python.

Easily estimate several segregation measures:

Perform comparative segregation:

What is segregation?

The PySAL segregation module allows users to estimate several segregation measures, perform inference for single values and for comparison between values and decompose comparative segregation.

It can be divided into frameworks:

Point Estimation: point estimation of many aspatial and spatial segregation indexes.
Inference Wrappers: present functions to perform inference for a single measure or for comparison between two measures.
Decompose Segregation: decompose comparative segregation into spatial and attribute components.

Installation

i) pip directly running in the prompt:

pip install segregation

ii) Using the conda-forge channel as described in https://github.com/conda-forge/segregation-feedstock:

conda config --add channels conda-forge
conda install segregation

iii) Another recommended method for installing segregation is with anaconda. Clone this repository or download it manually then cd into the directory and run the following commands (this will install the development version):

$ conda env create -f environment.yml
$ source activate segregation
$ python setup.py develop

iv) pip directly from this repository running in the prompt (if you experience an issue trying to install this way, take a look at this discussion: pysal#15):

$ pip install git+https://github.com/pysal/segregation

Segregation uses:

pandas
geopandas
matplotlib
scikit-learn
seaborn
numpy
scipy
libpysal
osmnx
pandana
urbanaccess

Getting started

Single group measures

All input data for this module rely on pandas DataFrames for the aspatial measures and geopandas DataFrames for spatial ones. In a nutshell, the user needs to pass the pandas DataFrame as its first argument and then two string that represent the variable name of population frequency of the group of interest (variable group_pop_var) and the total population of the unit (variable total_pop_var).

So, for example, if a user would want to fit a dissimilarity index (D) to a DataFrame called df to a specific group with frequency freq with each total population population, a usual call would be something like this:

from segregation.aspatial import Dissim
index = Dissim(df, "freq", "population")

If a user would want to fit a spatial dissimilarity index (SD) to a geopandas DataFrame called gdf to a specific group with frequency freq with each total population population, a usual call would be something like this:

from segregation.spatial import Spatial_Dissim
spatial_index = Spatial_Dissim(gdf, "freq", "population")

Every class of segregation has a statistic and a core_data attributes. The first is a direct access to the point estimation of the specific segregation measure and the second attribute gives access to the main data that the module uses internally to perform the estimates. To see the estimated D in the first generic example above, the user would have just to run index.statistic to see the fitted value.

For point estimation, all the measures available can be summarized in the following table:

Measure	Class/Function	Spatial?	Function Inputs
Dissimilarity (D)	Dissim	No	-
Gini (G)	Gini_Seg	No	-
Entropy (H)	Entropy	No	-
Isolation (xPx)	Isolation	No	-
Exposure (xPy)	Exposure	No	-
Atkinson (A)	Atkinson	No	b
Correlation Ratio (V)	Correlation_R	No	-
Concentration Profile (R)	Con_Prof	No	m
Modified Dissimilarity (Dct)	Modified_Dissim	No	iterations
Modified Gini (Gct)	Modified_Gini_Seg	No	iterations
Bias-Corrected Dissimilarity (Dbc)	Bias_Corrected_Dissim	No	B
Density-Corrected Dissimilarity (Ddc)	Density_Corrected_Dissim	No	-
Spatial Proximity Profile (SPP)	Spatial_Prox_Prof	Yes	m
Spatial Dissimilarity (SD)	Spatial_Dissim	Yes	w, standardize
Boundary Spatial Dissimilarity (BSD)	Boundary_Spatial_Dissim	Yes	standardize
Perimeter Area Ratio Spatial Dissimilarity (PARD)	Perimeter_Area_Ratio_Spatial_Dissim	Yes	standardize
Spatial Isolation (SxPx)	Spatial_Isolation	Yes	alpha, beta
Spatial Exposure (SxPy)	Spatial_Exposure	Yes	alpha, beta
Spatial Proximity (SP)	Spatial_Proximity	Yes	alpha, beta
Absolute Clustering (ACL)	Absolute_Clustering	Yes	alpha, beta
Relative Clustering (RCL)	Relative_Clustering	Yes	alpha, beta
Delta (DEL)	Delta	Yes	-
Absolute Concentration (ACO)	Absolute_Concentration	Yes	-
Relative Concentration (RCO)	Relative_Concentration	Yes	-
Absolute Centralization (ACE)	Absolute_Centralization	Yes	-
Relative Centralization (RCE)	Relative_Centralization	Yes	-

Once the segregation indexes are fitted, the user can perform inference to shed light for statistical significance in regional analysis. The summary of the inference framework is presented in the table below:

Inference Type	Class/Function	Function main Inputs	Function Outputs
Single Value	Infer_Segregation	seg_class, iterations_under_null, null_approach, two_tailed	p_value, est_sim, statistic
Two Value	Compare_Segregation	seg_class_1, seg_class_2, iterations_under_null, null_approach	p_value, est_sim, est_point_diff

Another useful analytics that can be performed with the segregation module is a decompositional approach where two different indexes can be brake down into spatial components (c_s) and attribute component (c_a). This framework is summarized in the table below:

Framework	Class/Function	Function main Inputs	Function Outputs
Decomposition	Decompose_Segregation	index1, index2, counterfactual_approach	c_a, c_s

Multigroup measures

It also possible to estimate Multigroup measures. This framework also relies on pandas DataFrames for the aspatial measures.

Suppose you have a DataFrame called df that has populations of some groups, for example, Group A, Group B and Group C. A usual call for a multigroup Dissimilarity index would be:

from segregation.aspatial import Multi_Dissim
index = Multi_Dissim(df, ['Group A', 'Group B', 'Group C'])

Therefore, a statistic attribute will contain the value of this index.

Currently, theses indexes are summarized in the table below:

Measure	Class/Function	Spatial?	Function Inputs
Multigroup Dissimilarity	Multi_Dissim	No	-
Multigroup Gini	Multi_Gini_Seg	No	-
Multigroup Normalized Exposure	Multi_Normalized_Exposure	No	-
Multigroup Information Theory	Multi_Information_Theory	No	-
Multigroup Relative Diversity	Multi_Relative_Diversity	No	-
Multigroup Squared Coefficient of Variation	Multi_Squared_Coefficient_of_Variation	No	-
Multigroup Diversity	Multi_Diversity	No	normalized
Simpson's Concentration	Simpsons_Concentration	No	-
Simpson's Interaction	Simpsons_Interaction	No	-
Multigroup Divergence	Multi_Divergence	No	-

If you are new to segregation and PySAL you will best get started with our documentation! We encourage you to take a look at some examples of this module in the notebooks repo!

Contribute

PySAL-segregation is under active development and contributors are welcome.

If you have any suggestion, feature request, or bug report, please open a new issue on GitHub. To submit patches, please follow the PySAL development guidelines and open a pull request. Once your changes get merged, you’ll automatically be added to the Contributors List.

Support

If you are having issues, please talk to us in the gitter room.

License

The project is licensed under the BSD license.

Funding

Award #1831615 RIDIR: Scalable Geospatial Analytics for Social Science Research

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Process number 88881.170553/2018-01

terratenney / segregation