thingiverse cosine-similarity python data-science data data-modeling flask bootstrap

ThingiRec

ThingiRec is a content-based recommendation system for thingiverse.com users. The web app can be accessed here.

Overview

ThingiRec uses item data from thingiverse.com to recommend to users other users with whom they should connect and parts that they may be interested in building. Thingiverse.com is a 3D printing hobbyist website where users share their 3D-printed creations. User recommendations are made by content-based filtering; cosine similarity is calculated between each of the user's parts and all other parts in the database for comparison. After the most similar items are found, the associated users are recommended to the input user.

The goal in using content-based filtering is to connect users based on printing complications they might have. For example, User A who is interested in ornate iphone cases and User B who is interested in automotive transmissions may not connect based on their outwardly stated interests, but they are both interested in functional gears. Content-based filtering may match them.

User A's Iphone Case	User B's Automotive Transmission

The Process

The overall process of the project can be broken down into 4 steps. These steps will be detailed below:

Data Collection
Data Transformation
Model Creation/Code Refactoring
App Creation and Deployment

1) Data Collection

Items uploaded to thingiverse.com have a maximum item id of ~1,500,000, representing ~1.5 million items that have been uploaded to the site. All potential item pages were inspected and ~500,000 records were yielded from the scraping. Many items have been deleted or hidden from the site since it's inception. The item id, name, description, and associated username was scraped from each page using BeautifulSoup, requests, and pandas and were stored in a PostgreSQL database using psycopg2.

The script used for scraping is /thingiscrape/item_scrape_thingiverse.py. In practice, this scraping was parallelized over 3 AWS instances to speed the collection.

2) Data Transformation

Upon launch of the web app, all of the part names and descriptions are vectorized by the sklearn TfidfVectorizer. The number of features was limited to 1000 to increase the speed of recommendation.

3) Model Creation/Code Refactoring

When a username is entered, cosine similarity is calculated between each of the user's parts and all other parts in the database. From the most similar parts, the related usernames are taken and are recommended for connection

The most challenging aspect of the project was creating recommendations in both a memory and time efficient manner. Through many iterations of code refactoring, the memory required for recommendations was reduced from <64 GB to <16 GB and the time requirement of a baseline recommendation was reduced from 20 minutes to 7 seconds.

4) App Creation and Deployment

The web app is written in Python using the Flask framework and is designed with a Start Bootstrap theme. The app is hosted on AWS.

About

thingiverse cosine-similarity python data-science data data-modeling flask bootstrap

Languages

Language:Jupyter Notebook 98.2%Language:JavaScript 1.0%Language:CSS 0.4%Language:HTML 0.2%Language:Python 0.2%