abdelrhmanwahba / product_recommender

product recommender system SVD algorithm ,and API using flask

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

product_recommender

The Data Sets

I used datasets from amzon product review, I've choosed Arts, Crafts and Sewing (meta & core5 ) becuase the size is small

  • This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version,

  • this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features),

  • I used datasets from amazon product review. I've chosen Arts, Crafts and Sewing (rating only, meta & core5) because the size is relatively small.

  • The datasets was zipped format and need to parse and clean.

The Data Sets in details

  1. rating only dataset :9447882 records, 4 columns
  • important columns:
    image
  1. Meta: 302988 records, 18 columns.
  • important columns:
    image
  1. core_5 data set: 494485 record, 12 columns. . important columns:
    image

the data sets after parsing and extractedfrom zipped format here!

cleaning data

  1. delete products with more than one product_id(asin)
with meta_titles(title,counts)
as
(
SELECT title,count(DISTINCT asin) counts from meta  
GROUP by title
HAVING counts > 1
)

DELETE from meta where meta.title in (SELECT title from meta_titles)
  1. delete users with morethan one id
with core_names (reviewerName,counts)
as
(
SELECT reviewerName,count(DISTINCT reviewerID) counts from core5
GROUP by reviewerName
HAVING counts != 1
)
DELETE from core5 where reviewerName in (SELECT reviewerName from core_names)
  1. delete users with more than on name
with core_ids(reviewerID,counts)
as
(
SELECT reviewerID ,count(DISTINCT reviewerName) counts from core5
GROUP by reviewerID
HAVING counts != 1
)
DELETE from core5 WHERE reviewerID in (SELECT reviewerID from core_ids)
  1. create new table (final dataset) from joining two tables after cleaning
CREATE TABLE arts_crafts
as
SELECT ratings.asin,meta.title, core5.overall as rating,
meta.brand, meta.main_cat,meta.price,
meta.image, core5.reviewer ID as userId, core5.reviewer Name as username
from ratings
join meta on meta.asin = ratings.asin
join core5 on core5.asin = ratings.asin and core5.reviewerID = ratings.userid

Final Dataset

  • 53956 record, 9 column
  • 15589 products
    image

integrated dataset used in the model:

  1. arts_craftss
  2. result after building th model arts_crafts_result

this plot shows how the density of the dataset , as we see most products have only few rating this means sparse issue and so that I used SVD algo

image

snapshot of the API
image

About

product recommender system SVD algorithm ,and API using flask


Languages

Language:Jupyter Notebook 98.2%Language:Python 1.8%