srmds / recommendation-engine-spark

A recommendation engine written in Scala with Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Recommandations engine

A recommendations engine written in Scala with Spark

Have a look at the wiki for Scala and Spark fundamentals

Note: This project is still Work In Progress (WIP)

This codebase is my work output as part of following Frank Kane's Course: Apache Spark 2 with Scala - Hands On with Big Data

Prerequisites

Build, compile & run

Clone the Repo

$ git clone https://github.com/srmds/recommendation-engine-spark

Configure

Logging is done via log4j

A template for log4j properties is included in the src/main/resources/log4j.properties.template path.

  • Create a custom log4j.properties file by using the template, from root of project run:
$ cp src/main/resources/log4j.properties.template src/main/resources/log4j.properties 

In order to have less verbose logging and only log our own explicit log lines, change the default logging settings.

  • Set the the loggin level from: INFO to ERROR:

Change the following line:

log4j.rootCategory=INFO, console

to:

log4j.rootCategory=ERROR, console

Note: the custom log4j.properties file should not be checked into version control and is therefore added to the .gitignore file.

Build

$ ./gradlew clean build

Run

$ ./gradlew run

All Together

$ ./gradlew clean run

Dependencies

  • Spark - 2.1.0

Resources

Benchmark of recommendations

Get all movie ratings

rating (stars) count (votes)
1 6110
2 11370
3 27145
4 34174
5 21201

See here for full analysis

Source file (100.000 rows): datasets/movielens/ml-100k/u.data

Elapsed time: 298 ms

Get the averages of friends by ages

age average of friends
18 343
19 213
26 242
27 228
28 209
34 245
35 211
36 246
37 249
38 193
39 169
67 214
68 269
69 235

See here for full analysis

Source file (500 rows): datasets/friends/fakefriends.csv

Elapsed time: 173 ms

Weather stations

Get minimum of temperatures

stationId Temperature (Fahrenheit)
EZE00100082 7.700001
ITE00100554 5.3600006

Get maximum of temperatures

stationId Temperature (Fahrenheit)
EZE00100082 16.52
ITE00100554 18.5

See here for full analysis

Source file (1825 rows): datasets/weather/temperatures.csv

Elapsed time: 506 ms

Word occurrences

count (occurence) word
2 refer
3 compared
4 forces
560 is
616 in
649 it
747 that
934 and
970 of
1191 a
1292 the
1420 your
1828 to
1878 you

See here for full analysis

Source file (~46.249 words): datasets/book/book.txt

Elapsed time: 377 ms

Spending amount per customer

amount (spent) customerId
3309.3804 45
4316.3 47
4327.7305 77
4367.62 13
4836.86 20
4851.4795 89
4876.8394 95
4898.461 38
5206.3994 87
5245.0605 52

See here for full analysis

Source file (10.000 rows): datasets/spending/customer_orders.csv

Elapsed time: 267 ms

Popularity of movies by ratings

count movieId
1 1494
1 1414
2 1585
2 907
2 1547
3 1361
3 1391
4 1223
4 1423
5 1489
5 1333
507 181
508 100
509 258
583 50

See here for full analysis

Source file (100.000 rows): datasets/movielens/ml-100k/u.data

Elapsed time: 219 ms

Popularity of superhero in social network

Most popular superhero

friendsCount (id,name)
1933 (859,CAPTAIN AMERICA)

Least popular superhero

friendsCount (id,name)
0 (467,BERSERKER II)
friendsCount (name, id)
106 RATTLER
238 SUPREME INTELLIGENCE
121 LEWIS
84 UNICORN/MYLOS MASARY
966 ICEMAN/ROBERT BOBBY
147 EEL II/EDWARD LAVELL
109 BLACK KNIGHT IV/PROF
668 SILVER SURFER/NORRIN
198 STANKOWICZ
1014 HERCULES [GREEK GOD]

See here for full analysis

Source files:

Elapsed time: 1515 ms

License

MIT License

Copyright (c) 2018 srmds

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

A recommendation engine written in Scala with Spark

License:MIT License


Languages

Language:Scala 95.0%Language:Perl 2.6%Language:Shell 2.4%