avanti-bhandarkar / ECE143_FinalProject_ProductCategorization

Repository for Fall 2023's Group 17's ECE 143 final project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ECE 143 Project: Makeup Product Categorization for E-Commerce Applications

Team Members

  • Swapnil Sinha
  • Pragnya Pathak
  • Xin Pan
  • Avanti Bhandarkar
  • Yuyang Wu

File Structure

--- Data/
|   +-- makeup_original.csv
|   +-- cleaned_makeup.csv
|   +-- withUSE.csv
|   +-- ingredients.csv
|   +-- ingredients.txt
|   +-- colorants.csv
|   +-- colorants.txt
--- Scripts/
|   +-- utils.py
|   +-- preprocessing.py
|   +-- lda.py
|   +-- models_SVM_TfIdf.ipynb
|   +-- models_SVM_GPT3.ipynb
--- ECE143_Group17_Project Proposal.pdf
--- ECE143_Team17_Presentation.pdf
--- ECE143_ProductCategorization_Visualizations.ipynb
--- LDAvis.html
--- README.md
  • Data stores all datasets for analysis.
    • makeup_original.csv - dataset from Heroku /makeup API
    • cleaned_makeup.csv - dataset after preprocessing
    • withUSE.csv - cleaned dataset with USE word embeddings saved
    • ingredients.csv / ingredients.txt - FDA approved cosmetic ingredients dataset
    • colorants.csv / colorants.txt - FDA approved cosmetic colorants dataset
  • Scripts stores all Python scripts.
    • utils.py contains helper functions for cleaning data and to perform certain feature engineering operations.
    • preprocessing.py contains all preprocessing functions used to preprocess the description column from makeup_original.csv
    • models_SVM_TfIdf.py contains SVM + Tfidf model for categorization
    • models_SVM_GPT3.py contains SVM + GPT3 model for categorization
  • ECE143_Group17_Project Proposal.pdf is our project proposal
  • ECE143_Team17_Presentation.pdf is the pdf of our presentation
  • ECE143_ProductCategorization_Visualizations.ipynb is our visualization notebook, LDA modelling is excluded (check Scripts/lda.py)
  • LDAvis.html HTML visualization of Latent Dirichlet Allocation based Topic Modelling
  • README.md

Installation

Make sure you have Python (version 3.9 or lower) installed on your machine. Then, follow these steps:

  1. Clone the repository:

    git clone https://github.com/avanti-bhandarkar/ECE143_FinalProject_ProductCategorization
  2. Install dependencies:

Install libraries mentioned in the 3rd party modules section below. Please note that some of these libraries may require the installation of other supplementary modules.

3rd Party Modules Required

  • Pandas - 1.5.3
  • Numpy - 1.23.5
  • Matplotlib - 3.7.1
  • Seaborn - 0.12.2
  • NLTK - 3.8.1
  • SpaCy - 3.6.1
  • Gensim - 4.3.2
  • Sklearn - 1.2.2
  • pyLDAvis - 2.1.2
  • Wordcloud - 1.9.2
  • Tensorflow - 2.14.0
  • OpenAI - 0.27.2

About

Repository for Fall 2023's Group 17's ECE 143 final project


Languages

Language:Jupyter Notebook 98.5%Language:HTML 1.4%Language:Python 0.1%