sevketsayin / SubsampledLogisticRegression

Repository of the AAAI paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

Code repository for the paper:

Agniva Chowdhury and Pradeep Ramuhalli. A Provably Accurate Randomized Sampling Algorithm for Logistic Regression. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.

Technical Appendix

Technical Appendix of the paper can be found in TechnicalAppendix.pdf.

Datasets

  1. Cardiovascular disease dataset (cardio): cardio_train.csv (sourced from here)
  2. Bank customer churn prediction dataset (churn): Bank Customer Churn Prediction.csv (sourced from here)
  3. Default of credit card clients dataset (default): default of credit card clients.csv (sourced from here)

Codes

  1. To compute row leverage scores of a matrix: leverage_scores.py
  2. To perform leverage score, l2s, or uniform sampling: row_sampling.py

The code for l2s sampling has been sourced from here.

Notebooks

To reproduce the experiments in the paper, run the following Jupyter Notebooks:

  1. For Cardiovascular disease dataset: cardio_train.ipynb
  2. For Bank customer churn prediction dataset: default_of_credit_card_clients.ipynb
  3. For Default of credit card clients dataset: Bank_Customer_Churn_Prediction.ipynb

Please contact Agniva Chowdhury for questions or comments.

About

Repository of the AAAI paper

License:MIT License


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%