IML Project
Project for the subject 'Introduction to Machine Learning'
Prerequisites
Packages used in this project are mentioned in requirements.txt.
You can install them by pip install -r requirements.txt
Quick Roadmap
To scrape and preprocess your dataset place it in root directory, go to the src directory and run first scraper.py, then date_formatter.py, then lemmatizer.py and finally preprocessing.py.
Notebooks/scripts:
- timeseries lda analysis - lda.py / lda_modified.ipynb
- clusters analysis - clusters.ipynb
- words as time series analysis - time_series_clustering.ipynb
- amount of mathematics in computer science - math_in_cs.ipynb
Data
Link to clean data : https://drive.google.com/file/d/1pBihRBnGs6VlFalr4BuxMwXw5xL5ZjY6/view?usp=sharing (367 MB zipped) | (1.22 GB decompressed)
Link to LDA models and results: https://drive.google.com/drive/folders/1fG-yuzZq_vhh8hk_PJw1vTZMTMAn_xjH
Final Report
The final report is included in IMLReport.pdf
. All details and achieved results are presented in the report.