Table of Contents
The GoraxGiz ML Competition aims to build models that predict the likelihood and ratio of individual liquidations in comparison to the amount borrowed (total_liquidation_to_total_borrow
). This competition utilizes DeFi transaction history and behavior data from various lending protocols and chains.
Liquidity-Analytics-and-Prediction/
├── notebook/
│ ├── Analytics_EDA/
│ └── Prediction/
│
├── models/
│ ├── liquidity_prediction.pkl
│
├── test/
│ ├── test_script.py
│ └── liquidity_prediction.csv
│
├── src/
│ ├── UML_prediction.png
│
The dataset includes wallet transaction history (borrow, lend, etc.) from the following lending protocols and chains -
- Aave
- Compound
- Cream
- RociFi
- Venus
- MakerDAO
- GMX
- Radiant
- Ethereum (full transaction history)
- Arbitrum (protocol-specific history)
- Fantom (protocol-specific history)
- Polygon (protocol-specific history)
- Optimism (protocol-specific history)
- BSC (protocol-specific history)
- Avalanche (protocol-specific history)
The Historical token prices for conversion to USDT at the transaction time are included.
To load the dataset, follow these steps -
- Install the
giza-datasets
package (version 0.2.2). - In your Python environment, instantiate a
DatasetsLoader
object and load the "gora-competition-training" dataset:
import certifi
import os
os.environ['SSL-CERT_FILE'] = certifi.where()
from giza_datasets import DatasetsLoader
loader = DatasetsLoader
df = loader.load("gora-competition-training")
- It supports the python version higher than or equal to 3.11, so it would be difficult to run in Google Colab or GCP Colab Enterprise.
- Installation of the giza-datasets module is necessary
pip install giza-datasets
- The provided notebook is made as per downloading the data from the giza and then making use of it by mounting it with the google drive for the ease of connection with google colab.
The following diagram illustrates the workflow of the pipeline -
The pipeline consists of the following steps:
- Loading and Analysis of the Dataset Provided
- To convert the categorical columns into numericals one to prepare it for training.
- Train multiple regression models using different algorithms and techniques - Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, Support Vector Regression, XGBoost Regression, LightGBM Regression, CatBoost Regression
- Perform hyperparameter tuning on each model using techniques like GridSearchCV.
- Combine the tuned models using a Stacking Regressor approach which provides the robust result.
- Evaluate the performance of the Stacking Regressor on the test data with the mean square error.
The Test Data is taken from the gora-competition-evaluation
which contains all the feature column except the target column, the prediction prepared is attached in the repository as liquidity_prediction.csv
.
The test script is also prepared to utilize the model developed and can be tested on the featured present and relevant as per the training dataset.
If you have any feedback, please reach out to us at hriskikesh.yadav332@gmail.com