This repository contains the second-place solution to the GEO-AI Challenge for Landslide Susceptibility Mapping by ITU. Below, you'll find information about the data, features, and the solution's workflow.
Please begin by fetching the data from this Google Drive link.
- Geological Faults:
geological_faults.gpkg
- Land Use:
land_use_land_cover.gpkg
- River Network:
river_network.gpkg
- Road Network:
road_network.gpkg
- Training Data:
Train.gpkg
- Testing Data:
Test.gpkg
- Training Data:
Train.csv
- Testing Data:
Test.csv
- Digital Terrain Model (DTM):
dtm.tif
- Average Precipitation 2020:
average_precipitation_2020.tif
- 90th Percentile Precipitation 2020:
90_perc_precipitation_2020.tif
Various feature extraction and transformation steps produce intermediate data that's stored within variables and DataFrames (e.g., dtm_data
, landslides
, no_landslides
, etc.).
The final output is a submission file containing the IDs of the test samples and their corresponding predicted targets. It's stored in a CSV file named submission.csv
.
- Geological Features:
- The proximity of each sample to geological faults, river networks, and road networks is computed.
- Geospatial Features:
- Data from geopackage files include information about geological structures and land use.
- Raster Data Features:
- Values extracted from raster datasets like DTM, average precipitation 2020, and 90th percentile precipitation 2020.
- Elevation:
- Elevation data is extracted from the DTM raster file for both landslides and non-landslide locations.
- Distance to Geological Features:
- The distances to the nearest fault, river, and road are calculated for each sample.
- Slope and Aspect:
- Slope and aspect values are calculated from the DTM and added as features. Aspect values of -9999 are replaced with the median aspect value to handle erroneous or missing data.
- X and Y Coordinates:
- The x and y coordinates of each sample are extracted and used as features.
- Handling Missing Data:
- Missing values in features like average and 90th percentile precipitation are imputed using the mode value of each feature.
- Data Splitting and Standardization:
- The dataset is split into training and testing sets, and features are standardized using StandardScaler.
- Model Training:
- A Random Forest Classifier is trained on the training data. The performance is evaluated using accuracy and a classification report, showcasing precision, recall, and F1-score for each class.
- Predictions on Test Data:
- The trained model is used to predict the test data. The test features are standardized using the scaler fitted on the training data before making predictions.
- Submission File:
- Predictions are saved along with the ID of each test sample in a CSV file named
submission.csv
.
- Predictions are saved along with the ID of each test sample in a CSV file named
- The data paths are hardcoded and need to be modified according to the actual data location on your system.