ankishb / ml-projects

This repo contains various Data Science projects involving image, text, tabular and graph dataset with classical ML as well as Deep Learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ml-projects

This reposatry contains my DS and ML-contest's projects, along with my personal fun project. I have dealt with diverse set of problem/data/metrics. Following is the summary of each project, which contains the type of dataset, type of problem and my-approach to handle that(all in very brief). More details can be found in each subdirectory.

If you want to look the following text in a table format, click here

Flipkart Object Detection

  • DataSet:
    • Image
  • Objective:
    • Bounding Box prediction
  • My Approach:
    • Designed a visual feature pipeline with attention on the object in image
    • Data Augmentation Technique along with its bounding box
    • Used Single Stage Detector Approach
    • Focal Loss with YOLO and SSD

Amazon Product Review classification

  • DataSet:
    • Text
  • Objective:
    • Classification
  • My Approach:
    • Data Cleaning/feature enginnering
    • Linear/Non-Linear Model
    • Deep Learning Attention Model
    • Pretrained Bert Model
    • Ensemble

HDFC Risk Prediction

  • DataSet:
    • 2500 unknown predictors
  • Objective:
    • Classification
  • My Approach:
    • Feature Understanding(EDA)
    • feature engineering
    • designed feature interaction tools
    • ensemble model using xgboost/lighgbm/catboost and linear/non-linear simple model
    • statistical model to understand the feature importance using p-values

Hike Friend Recommendation

  • DataSet:
    • Very big Dataset(45M observation, graph edge-representation)
    • Relational Feature
    • Category + Numerical
  • Objective:
    • Link Prediction
  • My Approach:
    • Graph Based features such as (adamic-adar, common-resource-allocation,...)
    • SVD feature for each user
    • Comunity-clustering
    • Subsemble(I did this after competition is over, to understand more about sampling and model building)
    • neighbour-based feature(Removed highly cardinal feature)
    • Also tried Deep learning approach (Graph Embedding), but couldn't handle at that time properly

Club Mahindra Hotel Room Price Prediction

  • DataSet:
    • Category + Numerical
    • Relational Dataset
  • Objective:
    • Regression
  • My Approach:
    • Feature engineering
      1. date-time based feature
      2. Aggregation based feature
      3. Relational Features
    • Ensemble using different set of tranformed target space

Cifar-10 Classification using Conditional Feature

  • DataSet:
    • Image
  • Objective:
    • Comparison between ResNet and my modified feature pipeline
    • Classification
  • My Approach:
    • Developed a weighted feature pipeline using global and local feature.
    • Global feature put constrained on local feature, to specifically focused on features of object in image
    • Better attention map around object, which reflect its learned feature.
    • Improved score by 1.37% over Resnet

Facenet

  • DataSet:
    • Image
  • Objective:
    • Face Verification
  • My Approach:
    • Matching Network Approach
    • Build a Student-Attentdance hardware using arduino
    • Hard Mining Approach(generate all permutation between classes to handle small dataset)
    • network-in-network approach to handle overfitting as i have very small dataset.
    • Achieved 93% accuracy

Few Shot Learning(Prototype Network)

  • DataSet:
    • Image
  • Objective:
    • Classification (training on very small dataset)
  • My Approach:
    • Prototype Algorithm implementation
    • There is more to this(will update in future)

JP.Morgan House Price Prediction

  • DataSet:
    • Category + Numerical
  • Objective:
    • Regression
  • My Approach:
    • Date based feature and Dummy feature
    • Interaction based feature
    • Bayesian optimization
    • out of fold prediction to generate Meta feature for ensemble

Hackerearth Platform Recommendation System

  • DataSet:
    • Text
  • Objective:
    • User-Problem Rating Prediction
  • My Approach:
    • My main concerns was to handle following question carefully:
      1. What is the strongest and weakest area of user?
      2. What is the level of problem?
      3. What problem user have just solved?
      4. If user gets stuck at current problem, what problem should help him(to gain confidence and to improve skill in that area)?
      5. Exploration and explotation strategy in recommending problem
      6. And many more?

LTFS Loan Status prediction

  • DataSet:
    • Category + Numerical
  • Objective:
    • Classification
  • My Approach: +

Segmentation

  • DataSet:
    • Image
  • Objective:
    • Segmentation
  • My Approach:
    • Implemented an U-Net architecture on blood cell Dataset.
    • fully convolutional network on traffic-street dataset.
    • Finally experimented with generative adverserial network for better generalization in the presence of limited dataset.

Future sale Prediction

  • DataSet:
    • Relational feature
    • Time-Series Feature
    • Categorical + Numerical
  • Objective:
    • Future Sales Prediction for different store in different cities
  • My Approach: +

Gartner Retention Status Prediction

  • DataSet:
    • Image
  • Objective:
    • Classification
  • My Approach:
    • EDA
    • Feature Engineering

Stock Prediction

  • DataSet:
    • Time-Series stock prices
  • Objective:
    • Future price prediction
    • Regression
  • My Approach:
    • Deep learning approach using RNN and LSTM

About

This repo contains various Data Science projects involving image, text, tabular and graph dataset with classical ML as well as Deep Learning.


Languages

Language:Jupyter Notebook 99.5%Language:HTML 0.2%Language:Python 0.2%