Ellie2020 / Data-Science-MSc-Individual-project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Science-MSc-Individual-project

FOLDERS:

  • Notebook
  • ResultsMetrics
  • S-curves (for pairs)
  • S-curves_triplets
  • Time_Memory (experiments run on the server)
  • Source code (python files and shell file run in the server)

Files in Source code:

DLSH.py MAIN source code for pairs LSH via MinHash similarity

TLSH.py MAIN source code for triplets LSH via MinHash similarity

WJS_JS.py code for Jaccard similarity and weighted Jaccard similarity

LSH_SIM_MH.py code computational time and memory for LSH and MinHash similarity for song pairs

runSparkMongo.sh sh file to run the python files on the server

README.txt detailed description for each python file

several notebooks with the plots of chapter 3

About


Languages

Language:Jupyter Notebook 97.0%Language:Python 3.0%Language:Shell 0.1%