Second Place Solution to GEO-AI Challenge for Landslide Susceptibility Mapping by ITU

Introduction

This repository contains the second-place solution to the GEO-AI Challenge for Landslide Susceptibility Mapping by ITU. Below, you'll find information about the data, features, and the solution's workflow.

Data

Please begin by fetching the data from this Google Drive link.

Geospatial Vector Data

Geological Faults: geological_faults.gpkg
Land Use: land_use_land_cover.gpkg
River Network: river_network.gpkg
Road Network: road_network.gpkg
Training Data: Train.gpkg
Testing Data: Test.gpkg

Tabular Data

Training Data: Train.csv
Testing Data: Test.csv

Raster Data

Digital Terrain Model (DTM): dtm.tif
Average Precipitation 2020: average_precipitation_2020.tif
90th Percentile Precipitation 2020: 90_perc_precipitation_2020.tif

Output Data and Storage

Intermediate Data

Various feature extraction and transformation steps produce intermediate data that's stored within variables and DataFrames (e.g., dtm_data, landslides, no_landslides, etc.).

Final Data

The final output is a submission file containing the IDs of the test samples and their corresponding predicted targets. It's stored in a CSV file named submission.csv.

Features Used

A. Original Features

Geological Features:
- The proximity of each sample to geological faults, river networks, and road networks is computed.
Geospatial Features:
- Data from geopackage files include information about geological structures and land use.
Raster Data Features:
- Values extracted from raster datasets like DTM, average precipitation 2020, and 90th percentile precipitation 2020.

B. Engineered Features

Elevation:
- Elevation data is extracted from the DTM raster file for both landslides and non-landslide locations.
Distance to Geological Features:
- The distances to the nearest fault, river, and road are calculated for each sample.
Slope and Aspect:
- Slope and aspect values are calculated from the DTM and added as features. Aspect values of -9999 are replaced with the median aspect value to handle erroneous or missing data.
X and Y Coordinates:
- The x and y coordinates of each sample are extracted and used as features.
Handling Missing Data:
- Missing values in features like average and 90th percentile precipitation are imputed using the mode value of each feature.

C. Data Preprocessing and Model Training

Data Splitting and Standardization:
- The dataset is split into training and testing sets, and features are standardized using StandardScaler.
Model Training:
- A Random Forest Classifier is trained on the training data. The performance is evaluated using accuracy and a classification report, showcasing precision, recall, and F1-score for each class.

D. Predictions and Submission

Predictions on Test Data:
- The trained model is used to predict the test data. The test features are standardized using the scaler fitted on the training data before making predictions.
Submission File:
- Predictions are saved along with the ID of each test sample in a CSV file named submission.csv.

Additional Notes

The data paths are hardcoded and need to be modified according to the actual data location on your system.

ITU-GeoAI-Challenge / landslide_prediction_2nd_place