Analysis of Wildfire Impact on Redmond, OR

The goal of this work is to explore the impact of wildfires on Redmond, Oregon. This project focused on the healthcare impact of wildfires on Redmond, OR, Deschutes County, and the greater Oregon area. Specifically, I investigated the respiratory health of citizens by tracking the county-level premature mortality rate, asthma-related hospitalizations, and COPD-related hospitalizations from 2000–2021. Additionally, I investigated the prevalence, incidence, and mortality of chronic respiratory illnesses at the state level. By combining data from both the state & country levels, I approximated the impact on the city itself without city-specific data resolution. I found that the rates of occurrence for asthma, COPD, and all chronic respiratory conditions have increased over the past 30 years and are expected to continue increasing significantly over the next 30 years. This follows what I expected and it is in line with my initial expectation that more smoke exposure will lead to more respiratory illnesses. Surprisingly, I see that the county’s premature mortality rate and asthma/COPD hospitalization rates have steadily dropped and are projected to drop in the future. I hypothesize that this is likely due to advances in modern medicine & a better medical infrastructure in Redmond. In my analysis, I validate a link between the respiratory health of a city and the smoke effect of wildfires but can’t validate the effect size.

A full report can be found here.

A secondary goal of this work is to develop the reproducibility & professionalism skills required for real-world data-driven analysis as part of the Fall 2023 DATA 512 course at the University of Washington.

Data Sources

Wildfire Data

Cleaned, collated data of wildfires was generated by the US Geological Survey as the Combined wildland fire datasets for the United States and certain territories, 1800s-Present (combined wildland fire polygons dataset. For this project, we used the GeoJSON data format stored under GeoJSON Files.zip. This folder contains both a raw merged dataset containing duplicates and a "combined", duplicate-free, dataset that comprises both wildfires and prescribed fires from the mid-1800s to the 2021 collated from 40 different original wildfire datasets. For our analysis, we will use only the combined data stored at ./input/USGS_Wildland_Fire_Combined_Dataset.json. Note that the data set is too large to track with Git and is therefore not available on this repository.

The data is listed as public and can be cited as the following:

Welty, J.L., and Jeffries, M.I., 2021, Combined wildland fire datasets for the United States and certain territories, 1800s-Present: U.S. Geological Survey data release, https://doi.org/10.5066/P9ZXGFY3.

It should be noted that much of this data was originally generated from an ArcGIS server and shares a lot of variable names with the ArcGIS software. For example, the reason the default geometry type is esriGeometryPolygon -- Esri is the developer of ArcGIS. As a result of this, some features, like accessing curved polygons stored in a format proprietary to ArcGIS proved difficult.

Data Description

The used GeoJSON file contained the following keys:

displayFieldName: an empty string that would otherwise denote the name of the dataset.
fieldAliases: a dictionary that converts the variable name in the file to a more human-readable format for the 30 fields.
geometryType: esriGeometryPolygon is the default geometry format.
spatialReference: the well-known ID (WKID) of the spatial reference (as well as the latest WKID). This data uses ESRI:102008, which refers to the Albers equal-area map projection of North America.
fields: A list of dictionaries identifying the name, type, and alias of 30 attributes. The alias is the same as those in fieldAliases. Also, the types are based on the Esri specifications. A full, detailed explanation of each of the attributes can be found here.
features: the list of all observations stored in JSON format. Note that each observation is saved as a dictionary with keys for the attributes (same as those in fields) and geometry, which contains sets of tuples denoting the coordinate points of the wildfire polygon in the WKID projection space. In the case of this data, most wildfires are polygons represented by 'rings' in ArcGIS. A ring is a list of points denoting the path of the ring. The exterior ring of a polygon is denoted by a list of points oriented clockwise while internal rings are points oriented in a counterclockwise motion. The rings key for geometry has multiple polygons decreasing in area. The first 'ring' denotes the largest boundary of the fire. Subsequent rings follow an even-odd fill rule (first fills, 2nd removes, etc). A few wildfires are represented by curveRings, which are similar to rings except that they define certain portions of a ring using parameters for preset curve functions rather than individual data points.

Of the 30 initial attributes, we select the following variables:

OBJECTID: A unique ID for each fire polygon.
Assigned_Fire_Type: The attributed type of the fire. Contains the following values: 'Wildfire', 'Likely Wildfire', 'Unknown - Likely Wildfire', 'Unknown - Likely Prescribed Fire', & 'Prescribed Fire'.
Fire_Year: The year of the fire season (int).
GIS_Acres: The overall area burned by the fire(s) in acres.
GIS_Hectares: The overall area burned by the fire(s) in hectares.
Listed_Fire_Names: A string of comma-separated values for the fire(s) attributed to the polygon.
Shape_Length: A proprietary ESRI raw measurement of the longest section of the polygon in the ESRI projection space units (assumed m)
Shape_Area: A proprietary ESRI raw measurement of the polygon area in the ESRI projection space units (assumed m^2).
rings: A list of polygon rings defining the overall fire polygon, with the first one denoting the fire perimeter.
curveRings: A list of polygon curveRings defining the overall shape, with the first one denoting the fire perimeter.

We further subset the data on two main criteria:

Fires from 1963 onwards (1963 inclusive).
Fires within 1250 miles of Redmond, Oregon. The first is easy enough to filter. We do this to avoid potentially bad data estimated before the advent of satellite imaging. The second requires a couple of assumptions:

We mark (44.272621, -121.173920) as the latitude & longitude coordinates as the center of the city. Source
We denote "within 1250 miles" to mean fires whose closest boundary point has a total straight-line ellipsoid distance less than 1250 miles to the center of the city. The first polygon ring is used as the boundary of the data. To calculate distance, we must convert the projection of the ring data from the current equal-area Albers projection (ESRI:102008) to a more accurate, WGS84, decimal-degrees representation better for distance calculations (EPSG:4326).

After filtering, we store the desired subset into ./intermediate/redmond_fire_subset.csv. Note that the file remains too large to be tracked by Git but can be generated by the code in ./analysis-part1-SmokeEstimates.

Air Quality Index Data

A portion of this analysis required historical Air Quality Index (AQI) data for Redmond, Oregon, which is located in Deschutes County, during the fire season (May 1 - October 31) for each year from 1963 onwards. The Air Quality Index is a measure designed to tell us how healthy the air is on any given day and is commonly used to track pollutants such as smog or smoke. Generally, a rating of 0-50 indicates healthy, clean air, while 500 is the highest tracked value for hazardous air. A thorough explanation of how AQI is calculated can be found here.

In this project, we used the US Environmental Protection Agency (EPA) Air Quality Service (AQS) API. The documentation for the API provides definitions of the different call parameters and examples of the various calls that can be made to the API. Additional information on the Air Quality System can be found in the EPA FAQ. Note that terms of use can be found here. All data accessed through the API lies in the public domain.

Specifically, we used the maximal daily average sensor data for monitoring stations in Deschutes County, all of which were <17 miles from Redmond, OR. These daily max values were then averaged for the fire season to get the annual estimate for 1983-2023. There was no data available before 1983. Finding nearby monitoring stations requires the Federal Information Processing Series (FIPS) of the desired city, county, and state. Information was gathered from here. A detailed walkthrough of the data collection process can be found in ./analysis_part1-AQI.ipynb. The final data can be found as ./output/final_annual_AQI_1983-2023.csv.

Respiratory Health Data

I incorporate health data aggregated from three main sources:

FRED is an online database of 1000s of economic time series data aggregated from national, international, public, and private sources. It is maintained by the Research Department at the Federal Reserve Bank of St. Louis. The terms of use for this database can be found here but generally, all data is meant for personal, non-commercial, or educational use. From this database, I accessed the annual Age-Adjusted Death Rate Data for Deschutes County from 1999 to 2020. This data was aggregated from the Centers for Disease Control and Prevention (CDC) and more information can be found here. This specific data set was provided under the public domain with citation. The premature death rate is defined as the total number of deaths where the deceased is younger than 75 years of age per 100,000.

The Oregon Tracking Data Explorer is part of a larger National Environmental Public Health Tracking Network funded by the CDC. As such, data from this source is provided under the public domain with citation. The database asks that users only use the data for statistical analysis and reporting purposes without attempting to learn or disclose the identities of individuals within the data. From this source, I downloaded two annual data sets:

The first tracks the annual age-adjusted asthma-related hospitalization rate per 10,000 and the raw counts for adults older than 25 from 2000 to 2021. Additionally, I have access to the crude rates for 20-year age bins of 25-44, 45-64, and 65-84. I also have access to similar data for emergency department visits but due to low data history (2018—), I ignore it for my analysis. The second tracks the same metrics for COPD-related annual hospitalizations. Age-adjusted rates allow for fairer comparisons to be made between groups with different age distributions. The IHME data is provided under the IHME FREE-OF-CHARGE NON-COMMERCIAL USER AGREEMENT. The data provided by IHME arises from the Global Burden of Disease Study in 2019 shown in an interactive query tool. From this query tool, I accessed the raw annual count and rate per 100,000 for the deaths, incidence, and prevalence of asthma, COPD, and all chronic respiratory illness diseases from 1990 to 2019 for the state of Oregon. Incidence monitors the number of new cases in a given year whereas prevalence tracks the total cases in a given year. In the data set, I have access to both the estimated metric as well as lower and upper bound values.

List of Data Sets:

Asthma Hospitalizations and Emergency Department Visits:
- Citation: McGeehin MA, Qualters JR, Niskar AS. National Environmental Public Health Tracking Program: bridging the information gap. Environ Health Perspect. 2004;14:1409–1413.
- Storage Location: ./input/deschutes_hospitalizations_asthma.csv
Chronic Obstructive Pulmonary Disorder (COPD) Hospitalizations & ER Visits:
- Citation: McGeehin MA, Qualters JR, Niskar AS. National Environmental Public Health Tracking Program: bridging the information gap. Environ Health Perspect. 2004;14:1409–1413.
- Storage Location: ./input/deschutes_hospitalizations_copd.csv
IHME Query Tool for Oregon Respiratory Illness Incidence, Prevalence, & Mortality Rates
- Citation: Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2020. Available from https://vizhub.healthdata.org/gbd-results/
- Storage Location: ./input/IHME-GBD_2019_DATA-Chronic_Respiratory_Illness_Oregon.csv
Age-Adjusted Death Rate Data:
- Citation: Centers for Disease Control and Prevention, Age-Adjusted Premature Death Rate for Deschutes County, OR [CDC20N2UAA041017], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/CDC20N2UAA041017, November 16, 2023.
- Storage Location: ./input/deshutes_age_adjusted_premature_death_rate.csv

API Documentation

EPA AQS API

Code

The code for accessing and analyzing the wildfire data can be found in the following files:

analysis-part1-SmokeEstimates.ipynb: An interactive Jupyter Notebook providing a detailed walkthrough of the entire data exploration, analysis, and storage process of the Wildfire data. Additionally, includes code for the smoke estimates & forecasting as well as the comparison of the estimate and the AQI data.
analysis-part1-AQI.ipynb: An interactive Jupyter Notebook providing a detailed walkthrough of the entire data acquisition, analysis, and storage process of the EPA AQS AQI data. Note that some portions of the code were developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code was provided under the Creative Commons CC-BY license. The rest of the code lies under the standard MIT license.
Extension-Data_Exploration.ipynb: An interactive Jupyter Notebook providing a detailed walkthrough of the data exploration, analysis, and storage process of the county and state respiratory health data. Additionally, includes code for the EDA plots.
Extension-HospitalizationModeling.ipynb: An interactive Jupyter Notebook providing a detailed walkthrough of forecasting the respiratory health indicators and linking it with the earlier smoke estimates.

Output Files

Data

./output/final_annual_AQU_1983-2023.csv: Contains the final average max daily summary AQI during the fire season from 1983 onward. The data has 2 variables:
- year: the year
- aqi: the annual average AQI value for the fire season.
./output/redmond_smoke_estimates.csv: Contains the final calculated smoke estimates from 1963 onward. The data has 3 variables:
- Fire_Year: the year of the fire season
- annual_smoke_intake: the cumulative smoke intake from all fires in that year.
- daily_avg_fire_szn: the average daily smoke intake during the fire season. Derived from annual_smoke_intake by dividing by the 184 days of the fire season.
./output/wildfire_smoke_forecast_1963-2050.csv: Contains the final annual smoke index from 1963-2020 and the forecasted annual smoke index until 2050. The data has 2 variables:
- Fire_Year: the year of the fire season
- Annual Smoke Index: the average daily smoke intake during the fire season. The forecast is done through an ARIMA model.

Images

./output/figure1-fires_freq_by_distance.png: a histogram showing the number of fires occurring at every 50-mile increment from Redmond, OR up to 1250 miles.
./output/figure2-annual_burn_over_time.png: a time series graph of total acres burned per year for the fires occurring within 1250 miles from Redmond, OR.
./output/figure3-comparing_smoke_AQI.png: a time series graph containing my annual fire smoke estimate & AQI estimate for Redmond, OR.
./output/ARIMA_forecast_smoke_estimate.png: a time series graph containing my ARIMA forecast of the smoke estimate until 2050 along with a 95% confidence interval.
./output/Forecasted_Disease_Patterns_2020-2050.png: a set of subplots each containing the VARMAX forecast of the respiratory health indicators (rates per 10,000) until 2050 along with a 95% confidence interval
./output/Respiratory_Health_Indicators_Correlation_Smoke.png: a heatmap tracking the Pearon and Spearman Correlation Coefficients for each respiratory health indicator with respect to the estimated annual smoke index.
./output/Respiratory_Health_State_Occurence.png: a set of 3 time series subplots tracking the state incidence and prevalence rates (per 10,000) across dual-axis plots for Asthma, COPD, and all Chronic RIs in Oregon
./output/Respiratory_Health_State_Mortality.png: a set of 3 time series subplots tracking the state mortality rates (per 10,000) for Asthma, COPD, and all Chronic RIs in Oregon
./output/Respiratory_Health_County_Premature_Mortality.png: a time series graph showcasing the premature death rate (rate per 10,000) for Deschutes County.
./output/Asthma_County_Hospitalization.png: time series plot showing the overall crude, overall age-adjusted, and individual hospitalization rates for each age group per 10,000 for asthma
./output/COPD_County_Hospitalization.png: time series plot showing the overall crude, overall age-adjusted, and individual hospitalization rates for each age group per 10,000 for COPD

Intermediate Files

Wildfire Data

We store the DataFrames containing the JSON outputs of the API calls for particulate and gaseous AQI data from 1963-2023 just in case.

./intermediate/gaseous_AQI_1963-2023.csv: Each row denotes an AQI/pollutant measurement by a sensor in the Deschutes County for Ozone & Carbon Monoxide, or (if they were present but wasn't) sulfur dioxide and nitrous oxide. The data contains 32 attributes as present in the "Data" section of the JSON response. Example JSON output can be seen here. }
./intermediate/particulate_AQI_1963-2023.csv: Each row denotes an AQI/pollutant measurement by a sensor in the Deschutes County for pollutants of Acceptable PM2.5 AQI & Speciation Mass, PM2.5 - Local Conditions, & PM10 Total 0-10um STP. The data contains 32 attributes as present in the "Data" section of the JSON response. Example JSON output can be seen here.

We also store the Wildfire polygon subset fires within 1250 miles of Redmond, OR after 1963 in the following ./intermediate/redmond_fire_subset.csv file as stated above but this can't be tracked by Git as it is too large.

Respiratory Health Data

./intermediate/combined_health_data.csv: The data contains rows from 1990 to 2021 with 23 other columns denoting the cleaned annual rates per 10,000 for the following variables (with OR denoting state-level data and DC denoting data for Deschutes County). Note that all state data exists from 1990-2019 while all county data exists from 2000-2021 except for the premature death rate, which also exists for 1999.
- 'Deaths: Asthma (OR)': rate of people dying due to Asthma in Oregon
- 'Deaths: Chronic obstructive pulmonary disease (OR)': rate of people dying due to COPD in Oregon
- 'Deaths: Chronic respiratory diseases (OR)': rate of people dying due to any chronic respiratory illness in Oregon
- 'Incidence: Asthma (OR)': rate of people with asthma in Oregon
- 'Incidence: Chronic obstructive pulmonary disease (OR)': rate of people with COPD in Oregon
- 'Incidence: Chronic respiratory diseases (OR)': rate of new people with any chronic respiratory illness in Oregon
- 'Prevalence: Asthma (OR)': rates of people with Asthma
- 'Prevalence: Chronic obstructive pulmonary disease (OR)': rates of people with COPD in Oregon
- 'Prevalence: Chronic respiratory diseases (OR)': rate of people with any chronic respiratory illness in Oregon
- 'Asthma: Age-Adjusted HR (DC)': Asthma hospitalization rates for everyone adjusted by age
- 'Asthma: Crude HR (DC)': Asthma hospitalization rates for all ages
- 'Asthma: Crude HR Ages 0-4 (DC)': Asthma hospitalization rates for ages 0-4
- 'Asthma: Crude HR Ages 5-14 (DC)': Asthma hospitalization rates for ages 5-14
- 'Asthma: Crude HR Ages 15-34 (DC)': Asthma hospitalization rates for ages 15-34
- 'Asthma: Crude HR Ages 35-64 (DC)': Asthma hospitalization rates for ages 35-64
- 'Asthma: Crude HR Ages 65+ (DC)': Asthma hospitalization rates for ages 65+
- 'Asthma: Total Hospitalizations (DC)': total Asthma hospitalizations for ages 65+
- 'COPD: Age-Adjusted HR (DC)': COPD hospitalization rates for all ages adjusted by age
- 'COPD: Crude HR (DC)': COPD hospitalization rates for all ages
- 'COPD: Crude HR Ages 25-44 (DC)': COPD hospitalization rates for ages 25-45
- 'COPD: Crude HR Ages 45-64 (DC)': COPD hospitalization rates for ages 45-65
- 'COPD: Crude HR Ages 65-84 (DC)': COPD hospitalization rates for ages 65+
- 'PDR Age-Adjusted (DC)': the premature death rate defined as those dying before 75.

Special Considerations

Linking Metrics with Smoke

To establish potential links between these metrics and my annual smoke index, I look at the pairwise correlations between each measure. Specifically, I look at both the Pearson correlation coefficients and the more robust, Spearman correlation coefficients. The Pearson correlation coefficient tests rely on the assumption that the data is normally distributed without outliers and that there is a linear relationship between the two indicators. Meanwhile, the Spearman correlation coefficient relaxes these assumptions, allowing for a more robust test.16 For both tests, I use a significance level of 0.05. To report values, I bin the correlation coefficients according to the following scale of the absolute coefficient: 0.0 – 0.3: Weak, 0.3–0.5: Moderate, 0.5–0.7: Strong, 0.7–1.0: Very Strong.

Forecasting Respiratory Health Metrics

To forecast the respiratory health indicators, I use a Vector Autoregressive Moving Average with eXogenous regressors model (VARMAX) to model multivariate time series using my smoke forecast and time as the exogenous variables. I use a multivariate approach as many of the health indicators are understandably strongly linked. Unlike ARIMA, VARMAX allows us to model the effects of one of these indicators on other variables. Specifically, I forecast the future for sets of variables at a time as the model cannot handle more than five endogenous variables at a time due to the small number of data points. To forecast asthma rates, I consider the number of deaths, incidence, and prevalence at the state level and the overall age-adjusted hospitalization rate & crude hospitalization rates for those above 65 at the county level at once. I use the same corresponding variables to forecast COPD. Additionally, I also forecast the overall mortality, incidence, and prevalence rates for all chronic respiratory illnesses at the state level.

When building the model, I split the data into 80% train and 20% test and conducted a grid search for the best hyperparameter values of p & q using the Euclidean norm of the Root Mean Squared Error to track model fit. I don’t use BIC as it is only valid when n is greater than the number of parameters in the model; in this case, the model has many parameters and very few data points. I take the Euclidean norm to ensure that the model predictions are valid for all endogenous variables and not just one at the expense of another.

Limitations

My analysis is severely limited in scope due to the inherent complexity of the task and the limitations in resources & time. To begin, I must contend with the fact that there are a series of confounding factors that all contribute to the trends in asthma & COPD prevalence, incidence, and hospitalizations. Advancements in medical care, medical accessibility, and changes in societal behavior concerning exercise, nutrition, and smoking all play a part in impacting the trends of asthma, COPD, and other chronic respiratory diseases. These are just a few of the external factors – it becomes increasingly complex trying to account for all of them.

Moreover, the reliance on historical data introduces a dependency on the consistency and reliability of records, which may be subject to variations in reporting practices, data collection methodologies, and technological advancements over time. Incomplete or inconsistent datasets may limit the depth and reliability of my findings, presenting another potential hurdle in achieving comprehensive insights. In fact, the medical coding for reporting the primary cause of hospitalization changed on October 1st, 2015 for both Asthma & COPD in Oregon’s county-level health tracker. This can potentially confound the seeming decrease in hospitalization rates for both diseases after 2015. Additionally, the set of medical diagnosis codes attributed to each disease differs between Oregon’s Tracking Data Explorer and the IHME dataset – thereby impacting the validity of drawing comparisons between the two datasets.

Typically, smoke is dependent on wind patterns over several days, the intensity of the fire, its duration, and the distance from the city. However, for the sake of this assignment, I only have access to the fire area, distance, and type. So, my smoke model is fundamentally flawed. My initial smoke estimate was also generated without being refined with a well-known target variable. AQI can be used to proxy the smoke estimate for a day but can’t be the end-all-be-all for validating a smoke estimate from distance, area, and fire type – which in and of itself cannot model the inherent complexity of smoke drift from 1000s of miles away. With a flawed baseline, any subsequent comparisons should be taken with a grain of salt. Furthermore, the data set only comprises US national wildfires yet portions of Canada and Mexico lie within the designated 1250 mi of Redmond, OR. These international wildfires most definitely contribute to Redmond’s smoke levels and should be accounted for.

I also only have access to fires during the fire season (May 1st through October 31st) without any information about the duration of each fire. As such, I’m forced to aggregate each fire’s smoke estimate and average each contribution by the number of days in a fire season. This average assumes that each fire's duration and inception were equal which is a fundamentally erroneous assumption and should be corrected in the future. Lastly, my analysis stems from an observational study of historical data. At most, I can present potential associations between predictor and outcome variables but I cannot suppose causation from my analysis. Though there is a significant association between the annual smoke index and the health indicators, I can’t judge the validity of the effect size without considering all the uncertainties of estimating the metrics themselves. A thorough, valid, deep statistical analysis of the task at hand lies beyond the scope of a month-long school project. Instead, I recommend that the city council hire knowledgeable statisticians with domain expertise to follow up on my analysis.

Additionally, I face limitations in my use of ARIMA and VARMAX to forecast annual smoke index and respiratory health indicators, respectively. Specifically, the size of my data set is extremely small – only 20 annual data points from 2000 to 2020 for the VARMAX forecasting. As such, any forecasts are highly susceptible to outliers like those in 2020. Some may even be influenced by poor data collection during the global pandemic. Additionally, with low-resolution data sampling (1 per year), it’s hard to capture the anomalies presented from wildfires – these often show themselves at the daily or weekly level. Furthermore, VARMAX makes an underlying assumption of the stationarity of the data. This condition likely isn’t met and any tests used for the data can’t be trusted as the data is simply too small to reasonably check.

Notes

Smoke Estimation

We created an initial annual estimate of wildfire smoke in Redmond, OR to better understand the impact of wildfires on residents inside the city. Throughout the project, we will consider other socio-economic impacts as well. For this section, we only estimated the smoke seen by the city during each annual fire season from just the Wildfire data and recognized its limitations. Specifically, our final estimate is as follows: $$s = \beta_0 + \beta_1 \frac{a}{d^2} + \beta_2 t,$$ where we let $s$ be the smoke experienced by the city due to a single fire, $a$ be the area, $d$ be the distance, and $t$ be the fire type. Additionally, we set $\beta$ as a tunable set of weights. Note that $\beta_0$ is the baseline amount of smoke present not attributed to wildfires, $\beta_1$ is the tunable fire-dispersal weight, and $\beta_2$ articulates the difference in baselines between different fire types. Ideally, if we know the levels of the known quantity we aim to model, we can tune the weights further. Currently, we use the following simple values for $\beta$: $\beta_0 = 0$, $\beta_1 = 1$, & $\beta_2 = 1$.

The above equation denotes the smoke effect for a single fire. To get an annual estimate, we sum the total smoke effects of all fires in a given year and divide by the number of days in the fire season. Summing the values approximates the total smoke intake by the city throughout the fire season. Dividing this by the number of days (184) in the fire season (May 1st through October 31st) could in turn approximate the average daily smoke quality for the city during the fire season. Note that this simplified average assumes that each fire's duration and inception were equal which is a fundamentally erroneous assumption but might make for a decent estimate.

Rationale

Typically, smoke is dependent on wind patterns over several days, the intensity of the fire, its duration, and the distance from the city. However, for the sake of this assignment, we only have access to the fire area & distance. Additionally, we can distinguish between the type of fire as a proxy for fire intensity: prescribed fires and true wildfires. Prescribed burns are conducted on days where weather conditions are optimal as a way to mitigate safety risks and the spread of smoke. [1] As such, prescribed fires can be assumed to contribute less to the smoke drifting over nearby cities than wildfires. We also know that larger fires near the city will contribute more to smoke quantity over a city than small fires further away. However, how much do we estimate each factor, area & distance, to contribute to the overall quantity?

Smoke is generated from incomplete combustion, denoted by the formula, $\text{Fuel} + O_2 \rightarrow CO_2 + H_2O + \text{byproducts}$. [2] This tells us that smoke is linearly proportional to the amount of fuel burned, which in turn is linearly proportional to the area burned. Meanwhile, the intensity of energy, force, or flux evenly radiated from a source follows an inverse-square law with distance as commonly observed with light. [3] We can model smoke as flux originating from the fire and evenly radiating outward from the burned area. So, the smoke estimate of the city can be an inverse square of the distance between the city and the fire. Putting the above assumptions together yields our initial estimate for smoke from a single fire: $$\text{smoke} \propto \frac{\text{area}}{\text{distance}^2}$$

However, we know that the type of fire drastically changes its dispersal over a city. We can model this as a varying baseline for each fire type: $$\text{smoke} \propto \frac{\text{area}}{\text{distance}^2} + \text{fire type}$$

Note that while area and distance might have different imperial units, we can ignore the conversion as this is something that can be tuned with $\beta$.

AQI Aggregation

A summary AQI index data was provided for a series of pollutants (3 particulate & 2 gaseous).

Particulate: Acceptable PM2.5 AQI & Speciation Mass, PM2.5 - Local Conditions, & PM10 Total 0-10um STP
Gaseous: Carbon Monoxide & Ozone
- Nitrous Oxide and Sulfur dioxide sensors weren't available. Wildfire smoke is mainly composed of fine (PM 2.5) particles (>90% by mass) but also contains some percentage of coarse particles (PM10 particles) and a small percentage of gaseous pollutants as well. Source 1, Source 2

Since we don't have a good understanding of the exact proportion of each pollutant's contribution to smoke & that the overall reported AQI is the maximum value of the AQI for each subcategory, we'll make our estimate to be the highest AQI for any given day from any of the five stations near the city.

Smoke Estimate Forecasting

Data was forecasted using an ARIMA model. Portions of the modeling step were taken from this article by Brendan Artley under the MIT license.

Python & Jupyter Set-Up

This work assumes that users have a working Jupyter Notebook & Python 3 setup. Instructions on installing them can be found here. It should be noted that Python modules required for this work comprise some standard modules that are installed with Python and others that are installed through the Anaconda distribution.

If modules are not found, they can be readily installed with the following terminal commands:

    pip install <module name>

   conda install <module name>

EPA AQS API KEY

Please note that an account tied to an email is required to use the API. Steps for setting up the API Key were detailed by Dr. McDonald:

Create an email address & request an API key using the EPA endpoint & function defined below.

import json
def request_signup(email_address = None,
                   endpoint_url = API_REQUEST_URL, 
                   endpoint_action = API_ACTION_SIGNUP, 
                   request_template = AQS_REQUEST_TEMPLATE,
                   headers = None):
    """
    Function request access using an email address. 
    The parameters are standardized so that this function definition matches all of the others. 
    However, the easiest way to call this is to simply call this function with your preferred email address.
    Parameters
        email_address (str): The email address to use for the sign-up request.
        endpoint_url (str): The base URL of the API endpoint.
        endpoint_action (str): The specific action or endpoint for the sign-up request.
        request_template (dict): A dictionary containing request parameters and values.
        headers (dict): Optional headers to include in the request.
        
    Returns:
    - dict or None: A JSON response containing the sign-up request process
        Returns None if there is an exception during the request.

    Raises:
        Exception: If any required parameters are missing.
    """
    # Make sure we have a string - if you don't have access to this email address, things might go badly for you
    if email_address:
        request_template['email'] = email_address        
    if not request_template['email']: 
        raise Exception("Must supply an email address to call 'request_signup()'")
    
    # Compose the signup url - create a request URL by combining the endpoint_url with the parameters for the request
    request_url = endpoint_url+endpoint_action.format(**request_template)
        
    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # During the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response
response = request_signup("ymanne@uw.edu")

Validate email using the link EPA sends. Once the API Key token is created, store your Wikimedia username and access token as USERNAME and APIKEY string variables in a file called my_secrets.py. This file & the two variables are imported into the code but should never be published publically.

yashmanne / Wildfire-Analysis

Analysis of Wildfire Impact on Redmond, OR

Data Sources

Wildfire Data

Data Description

Air Quality Index Data

Respiratory Health Data

List of Data Sets:

API Documentation

EPA AQS API

Code

Output Files

Data

Images

Intermediate Files

Wildfire Data

Respiratory Health Data

Special Considerations

Linking Metrics with Smoke

Forecasting Respiratory Health Metrics

Limitations

Notes

Smoke Estimation

Rationale

AQI Aggregation

Smoke Estimate Forecasting

Python & Jupyter Set-Up

EPA AQS API KEY

About

Languages