-
This is a
recommendation system
based on Django -
Mainly there are 3 algorithms uses:
a. UserCF
(User-based Collaborative Flitering)
b. ItemCF
(Item-based Collaborative Filtering)
c. LFM
(Latent Factor Model)
This project mainly uses ItemCF
as the recommend algorithm.
Following the traditional MVC(Model/View/Control) architecture
Google's highcharts library
The recommend principle is If more people like item A and item B at the same time, then item A and item B have obvious similarities.
In short, the basic similarity is calculated by a complex formula. You can the the implementation of codes in here
for userID, items in trainset.items():
for i in items:
counter.setdefault(i, 0)
counter[i] += 1
simitems = simMatrix.setdefault(i, {})
for j in items:
if not i == j:
simitems.setdefault(j, 0)
simitems[j] += 1
for i, simitems in simMatrix.items():
for j in simitems:
simMatrix[i][j] /= math.sqrt(counter[i] * counter[j])
A very important part of the recommender system is to get accumulation data from scratch called cold start
.
The project used requests
to build a multi-threaded crawler that extractedf Baidu music, kugou music, kuwoo music. After data cleaning (Removing the urls that are already dead links, there was a total of 100000 records used as train dataset.
Each music record includes UserID, music, url, rating
. And the rating is based on the number of comments.
`0-500` : `1 point`
`500-1000` : `2 points`
`1000-2000` : `3 points`
`2000-3000` : `4 points`
`3000-more` : `5 points`(i.e. most popular)
- BackEnd:
Django
- FrontEnd:
Bootstrap + JQuery(Ajax)
- Deploy :
Nginx+Gunicorn+Supervisor