☀️ AI driven weather forecasting model.

📈 Weather forecasts as a point multivariate time series forecasting problem with Seq2Seq neural networks.

Trying to predict the most exact temperature, wind speed etc. for hours and days ahead using LSTM, BiLSTM, TCN, Transformer, Spacetimeformer, NBEATSx. and some data analysis tools.

Note: work is being performed in MBelniak fork

📇 Datasets

There are 3 different datasets used in this project. Based on experiment settings neural network uses datasets 1., 1. & 2. or 1. & 2. & 3.

Synop reports from ground stations https://danepubliczne.imgw.pl/

Multiple parameters are fetched and used, see src/synop/consts.py

wind velocity, direction and gusts are fetched from https://danepubliczne.imgw.pl/datastore for higher time and value resolution

GFS 0.25° archive forecasts from https://rda.ucar.edu

Multiple parameters are used, see src/wind_forecast/config/train_parameters/CommonGFSConfig.json

Maximum Reflectivity images (CMAX) https://danepubliczne.imgw.pl/datastore

The flow of getting GFS archive data is described in gfs-archive-0-25 module. Synop data is fetched in src/synop/fetch_synop_data.py. CMAX data is fetched in src/radar/fetch_radar_CMAX.py and processed in radar/preprocess_cmax.py

💻 Key technologies

Pytorch for creating models

Pytorch Lightning for training regime

Weights & Biases for logging and plotting results

Hydra for configuration

Optuna for tuning

Numpy, Pandas, scikit-learn, matplotlib, seaborn as tooling

💡 Models

All models work in Seq2Seq fashion, with configurable time window and forecast horizon.

LSTM - encoder-decoder architecture with stacked LSTMs, as described in Sequence to Sequence Learning with Neural Networks

BiLSTM - same as LSTM, but with bidirectional encoder

TCN - encoder-decoder architecture as described in Temporal Convolutional Networks for the Advance Prediction of ENSO . There is also a model with just an encoder and a model with additional attention layers

Transformer - model based on Attention is all you need

Spacetimeformer - model based on Long-Range Transformers for Dynamic Spatiotemporal Forecasting

NBeatsx - model based on Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx

🔧 Configuration

There are several scopes in which an experiment can be configured. For tips on configuring run from command line see scripts. Also, see how to create predefined config via config files at src/wind_forecast/config/experiment or src/wind_forecast/config/optim.

config.experiment

training regime - nr of epochs, skip training, save checkpoint etc,

model specific config - models hyperparameters, dropout etc,

problem specific config - time window length, horizon length, target parameter, target location etc,

datasets config - val/test split, synop file, weather parameters to use, dates range

config.optim

lr, lr scheduler, optimizer, loss

config.lightning

deterministic training, gpus

config.tune - tune config; set of params to check

There are multiple configurations (yaml files) already prepared in src/wind_forecast/config/experiment, but they all use Sequence2SequenceWithCMAXDataModule, which requires CMAX files (reflectivity images). If you don't use CMAX files, better use Sequence2SequenceDataModule together with use_cmax_data: False and load_cmax_data: False. Sequence2SequenceWithCMAXDataModule is used in my experiments to have equal datasets across all experiments in my thesis.

🏃 Running

Obtaining datasets is described in synop readme, GFS readme and CMAX readme.

Prepared synop data (csv file) should be placed in src/data/synop directory. There are already some files ready. Prepared GFS and CMAX datasets should be placed in a pkl directory placed in a directory pointed via GFS_DATASET_DIR and CMAX_DATASET_DIR environment variables.

First, create conda environment

conda env create -f environment.yml

Then, install dependencies

pip install -r requirements.txt

To run experiment, in src directory:

python -m wind_forecast.main experiment=<experiment_yml_file> [options...] # e.g. python -m wind_forecast.main experiment=transformer experiment.batch_size=32 lightning.gpus=0

Run modes

RUN_MODE variable from .env file switches run mode. Do not specify in order to run a basic full training.

RUN_MODE=debug # Disables W&B logging and loads only a small part of datasets in order to start and perform the training process faster RUN_MODE=tune # Performs tuning process. See [tune](https://github.com/MBelniak/WindForecast/tree/master/src/wind_forecast/config/tune) for examplary tune configs. RUN_MODE=tune_debug # Joins the two above

Weights & Biases

Add the following to .env to enable logging to W&B:

RESULTS_DIR=<relative to repo root, target dir for logs, checkpoints etc.> WANDB_ENTITY=<your w&b username> WANDB_PROJECT=<your w&b project name>

Troubleshooting and tips

Faster dataloaders

The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance

By default data is not loaded in parallel due to a problems on my Windows machine. You can try speeding it up by setting experiment.num_workers to a number of cores on your machine
or a smaller number if there are CUDA errors.

MBelniak / WindForecast

☀️ AI driven weather forecasting model.

📈 Weather forecasts as a point multivariate time series forecasting problem with Seq2Seq neural networks.

📇 Datasets

💻 Key technologies

💡 Models

🔧 Configuration

🏃 Running

Run modes

Weights & Biases

Troubleshooting and tips

Faster dataloaders

About

Languages