Are you a researcher in the field of environmental science, agriculture, or hydrology? Are you constantly on the lookout for efficient tools to retrieve and analyze soil moisture data?
This Python script retrieves high-temporal resolution (5 minute) soil moisture data from the Alabama Climotolgy Office located at the University of Alabama Huntsville. The data will be stored as CSV files in a local "data" directory created by the script in the same location it is ran. It uses asyncio, aiohttp, pandas, and tqdm to efficiently fetch the recorded history of each station, providing tons of useful data at a high download speed.
Here is the University of Alabama press release describing the soil moisture program.
Before running the script, ensure you have the following dependencies installed:
- Python 3.9.5
- aiohttp 3.8.5
- pandas 2.0.3
- requests 2.31.0
- tqdm 4.66.1
You can easily install python and all dependencies in a virtual environment using conda and pip:
conda create --name myenv python=3.9.5
conda activate myenv
pip install -r requirements.txt
- Clone this repository to your local machine.
- Open a terminal and navigate to the directory containing the script and the
requirements.txt
file. - Install the dependencies as mentioned above.
- Run the script:
python getstations.py
This script performs the following tasks:
- Reads station metadata from a remote server.
- Fetches data for each station listed in the metadata.
- Processes and saves the data as CSV files in a local "data" directory (Read Script Processing below).
- Utilizes asyncio to efficiently handle multiple asynchronous requests.
- Provides terminal progress bars to monitor download speeds.
This script will clean data on the Alabama Climatology Office server. This means that values below 900.00 mV and above 2200.00 mV are set to NaN. This is because values that do not satisfy this range are assumed to not be representative of real soil moisture values. Sensors were tested in the lab prior to being installed causing erroneous values to be reported that are not real. Values labled as -999.99 are also set to NaN. Sometimes clock errors in the sensors cause erroneous values to be reported.
The script will display progress bars using tqdm for each station being retrieved. Once the script finishes execution, all retrieved data will be stored in the ./datadir/ directory.
The script will also output a logfile in the ./logdir/stemneterror.log if there are bad server requests (404 error), or if station data could not be processed due to some exception that is thrown during processing.
For questions or issues, please open an issue on the GitHub repository.