GeoNet 2022

GeoNet analyzes stream networks to detect statistically significant changes between background and potentially impacted sites.

References

Agarwal, A., Wen, T., Chen, A., Zhang, A.Y., Niu, X., Zhan, X., Xue, L. and Brantley, S.L., 2020. Assessing contamination of stream networks near shale gas development using a new geospatial tool. Environmental Science & Technology, 54(14), pp.8632-8639. Link to paper

Dependencies

This package was last tested in April 2022. Testing environment is listed below:

Required Package	Version
R	4.0.5
tidyverse	1.3.0
geoshpere	1.5-10
network	1.16.1
igraph	1.2.6
mapdata	2.3.0
intergraph	2.0-2
sna	2.6
maps	3.3.0
GGally	2.1.1
MASS	7.3-53.1
foreach	1.5.1
doParallel	1.0.16
data.table	1.14.0

How to run the code:

Make sure all of above dependencies are installed before running the code
Put your data under the data folder. There are three main files required by GeoNet:
1. Shape file for the stream network (shape.RData).
2. Analyte location and concentration (analyte_raw.csv) The analyte csv file should atleast have latitude, longitude, date and concentrations header Sample anaylte file:
```
date,	latitude,	longitude,	Specific Conductance (conc)
6/8/2000 8:00,	40.3747167,	-78.8516,	1660
1/2/2001 9:45,	40.3747167,	-78.8516,	1700
```
  Also rename all the headers as date, lat, lon and conc
3. Polluter locations (polluter_raw.csv) The polluter files should atleast have latitude, longitude and date information. Sample polluter file:
```
ID,	latitude,	longitude,	date
2,	39.8284,	-80.323389,	8/26/14
6,	41.560381,	-76.263944,	8/4/14
```
  Also rename the headers as lat, lon and date
Run the code/Cl_spill_clust.R.
Output files will be generated in the inference folder

Output Files

All output files are generated in the inference folder. The most important file to check is the polluter_test_matrix.RData. It has the statistical inference test results for each polluter provided. To summarize it contains the upstream and downstream concentration values and t-test and wilcoxon test results to denote whether the values differ.

Caveats

Make sure you provide the datasets in the exact format as provied in the example. including the name and order of the columns.
Make sure you update the file_path variable to point to absolute path of the base directory of this repository on your computer.
If the dataset is large try running each section of the code seperately and check for the intermediate output variables values for NAs
Refer the data flow diagram for the expected size of the output dataframes after each step

For more information about the code check out https://drive.google.com/file/d/1AFr1qGLGhAfZwWw8E_BCVhmYF6ohJmus/view?usp=sharing

Contact

For any questions about the source codes or example datasets, please reach out to Dr. Tao Wen at Syracuse University (https://jaywen.com/)

ESIPFed / GeoNet2022