wolfsinem / product-tagging

Part of an internal project for my internship

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automated Product Tagging

This project of Automated Product Tagging is part of my internal project for my internship: Onestdata.

Every product is made up of several tags that are set to describe its characteristics. These tags can include anything about the product, e.g. color, size and type. These tags allow visitors to filter products based on the categories they want to explore.

The algorithm is largely based on the NLTK library. The NLTK (Natural Language Toolkit) library is a leading platform for building Python programs to work with human language data. Since we work with a dataset which has a description column, containing human language, this package is really useful in producing tags for products. For more documentation you can click on this link: NLTK

The machine learning model on the other hand is based on the TfIdfVectorizer. This method tokenizes documents/texts, learns the vocabulary and inverses the document frequency weighting and allows you to encode new documents. For more documentation you can click on this link: TFIDF

Alongside the model I chose for the LinearSVC (Linear Support Vector Classification). The purpose of this model is to fit to the data you provide, returning a "best fit" hyperplane that divides, or categorizes, your data. From there, after getting the hyperplane, you can then feed some features to your classifier to see what the "predicted" class is. See: NLTK. Because we are dealing with products that can carry multiple tags, this is a good multilabel classification model.

Workflow

workflow

UI Home page to use the machine learning model

alt text

UI Upload CSV page to upload a file

alt text

Installation

Use the package manager pip to install the needed libraries.

pip install -r requirements.txt

Run

flask run

or

python app.py

About

Part of an internal project for my internship

License:Apache License 2.0


Languages

Language:Jupyter Notebook 91.7%Language:Python 5.1%Language:HTML 2.8%Language:CSS 0.3%Language:JavaScript 0.1%