Hrishikesh332/Liquidity-Analytics-and-Prediction

Gora ML Liquidity Analytics and Preduction

Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About
Features
Tech Stack
Languages and Tools
Workflow
Instructions on running project locally
Feedback

Overview

The GoraxGiz ML Competition aims to build models that predict the likelihood and ratio of individual liquidations in comparison to the amount borrowed (total_liquidation_to_total_borrow). This competition utilizes DeFi transaction history and behavior data from various lending protocols and chains.

Structure of the Repository


Liquidity-Analytics-and-Prediction/
├── notebook/
│   ├── Analytics_EDA/
│   └── Prediction/
│   
├── models/
│   ├── liquidity_prediction.pkl
│   
├── test/
│   ├── test_script.py
│   └── liquidity_prediction.csv
│  
├── src/
│   ├── UML_prediction.png
│

Data Collection

The dataset includes wallet transaction history (borrow, lend, etc.) from the following lending protocols and chains -

Lending Protocols

Aave
Compound
Cream
RociFi
Venus
MakerDAO
GMX
Radiant

Chains

Ethereum (full transaction history)
Arbitrum (protocol-specific history)
Fantom (protocol-specific history)
Polygon (protocol-specific history)
Optimism (protocol-specific history)
BSC (protocol-specific history)
Avalanche (protocol-specific history)

The Historical token prices for conversion to USDT at the transaction time are included.

Loading the Dataset

To load the dataset, follow these steps -

Install the giza-datasets package (version 0.2.2).
In your Python environment, instantiate a DatasetsLoader object and load the "gora-competition-training" dataset:

import certifi
import os
os.environ['SSL-CERT_FILE'] = certifi.where()
from giza_datasets import DatasetsLoader

loader = DatasetsLoader
df = loader.load("gora-competition-training")

It supports the python version higher than or equal to 3.11, so it would be difficult to run in Google Colab or GCP Colab Enterprise.
Installation of the giza-datasets module is necessary

pip install giza-datasets

The provided notebook is made as per downloading the data from the giza and then making use of it by mounting it with the google drive for the ease of connection with google colab.

Pipeline Workflow

The following diagram illustrates the workflow of the pipeline -

The pipeline consists of the following steps:

Loading and Analysis of the Dataset Provided
To convert the categorical columns into numericals one to prepare it for training.
Train multiple regression models using different algorithms and techniques - Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, Support Vector Regression, XGBoost Regression, LightGBM Regression, CatBoost Regression
Perform hyperparameter tuning on each model using techniques like GridSearchCV.
Combine the tuned models using a Stacking Regressor approach which provides the robust result.
Evaluate the performance of the Stacking Regressor on the test data with the mean square error.

Test Data and the Evaluation

The Test Data is taken from the gora-competition-evaluation which contains all the feature column except the target column, the prediction prepared is attached in the repository as liquidity_prediction.csv.

The test script is also prepared to utilize the model developed and can be tested on the featured present and relevant as per the training dataset.

Feedback

If you have any feedback, please reach out to us at hriskikesh.yadav332@gmail.com

Hrishikesh332 / Liquidity-Analytics-and-Prediction