ElizaLo / Machine-Learning

Awesome list (courses, books, videos etc.) and implementation of Machine Learning Algorithms

Home Page:https://elizalo.github.io/Machine-Learning/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hits

This repository contains examples of popular machine learning algorithms implemented in Python with mathematics behind them being explained.

Constantly updated. Subscribe not to miss anything.

  • For Deep Learning algorithms please check Deep Learning repository.

Table of Contents

Machine Learning Map

machine-learning-map.png

🎓 Courses

🔹 Introductory Lectures:

These are great courses to get started in machine learning and AI. No prior experience in ML and AI is needed. You should have some knowledge of linear algebra, introductory calculus and probability. Some programming experience is also recommended.

🔸 Advanced Lectures:

Advanced courses that require prior knowledge in machine learning and AI.

🔹 Online Courses

🟥 YouTube

📚 Books

Conferences

International

North America

Europe

Ukraine

▶️ Websites

:octocat: GitHub Repositories

Title Description, Information
Top-down learning path: Machine Learning for Software Engineers
100-Days-Of-ML-Code
ml-course-msu Репозиторий с конспектами, кодом и прочими материалами к семинарам по машинному обучению ВМК МГУ
100-best-github-machine-learning
awesome-machine-learning
trekhleb, homemade-machine-learning Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
trekhleb, machine-learning-experiments Interactive Machine Learning experiments: models training + models demo
trekhleb, machine-learning-octave MatLab/Octave examples of popular machine learning algorithms with code examples and mathematics being explained
Machine Learning Notebooks A collection of Machine Learning fundamentals and useful python notebooks by Diego Inácio
Open Source Society University's Data Science course This is a solid path for those of you who want to complete a Data Science course on your own time, for free, with courses from the best universities in the World
data-science-blogs
Dive into Machine Learning (:octocat: repo on github) with Python Jupyter notebook and scikit-learn
Рекомендации от преподавателей курса «Математика и Python» и специализации
Литература для поступления в ШАД
Machine learning cheat sheet - soulmachine (2015)
Probabilistic Programming and Bayesian Methods for Hackers (free)
ml-surveys Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
Machine_Learning_and_Deep_Learning Getting started with Machine Learning and Deep Learning
MachineLearning_DeepLearning Share about Machine Learning and Deep Learning
Machine Learning Guide A guide covering Machine Learning including the applications, libraries and tools that will make you better and more efficient with Machine Learning development.

Awesome List

📌 Other

Big Data

Neural Networks

Reinforcement Learning

Mathematics for AI, ML, DL, CV

Linear Algebra

Theory of Probability and Mathematical Statistics

Bayesian Statistics

  • Files from lecture:

Bayesian statistics and related books:

  • C.P. Robert: The Bayesian choice (advanced)
  • Gelman, Carlin, Stern, Rubin: Bayesian data analysis (nice easy older book)
  • Congdon: Applied Bayesian modelling; Bayesian statistical modelling (relatively nice books for references)
  • Casella, Robert: Introducing Monte Carlo methods with R (nice book about MCMC)
  • Robert, Casella: Monte Carlo Statistical Methods
  • some parts of Bishop: Pattern recognition and machine learning (very nice book for engineers)
  • Puppy book from Kruschke

Causal Inference

Correlation does not imply causation

More online lectures, courses, papers, books, etc. on Causality:

Casual Machine Learning (Papers):

Experimental designs for casual learning:

  • Matching
  • Incident user design
  • Active comparator
  • Instrumental variables estimation
  • Difference-in-differences
  • Regression discontinuity design
  • Modeling

Algorithms

Machine Learning System Design

Deploy Machine Learning Model to Production

Python, IPython, Scikit-learn etc.

Code editors

  • PyCharm от JetBrains - серьезная IDE для больших проектов

  • Spyder – the Scientific PYthon Development EnviRonment. Spyder входит в Анаконду (просто введите spyder в командной строке)

  • Canopy — scientific and analytic Python deployment with integrated analysis environment (рекомендуют в курсе MITx)

  • Rodeo — a data science IDE for Python

  • Jupyter – open source, interactive data science and scientific computing across over 40 programming languages. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text

  • nbviewer – renders notebooks available on other websites

  • Sublime Text 3 - VIM XXI века*;, отлично подходит для python, если использовать вместе с плагинами:

    • Package Control - для быстрой и удобной работы с дополнениями
    • Git - для работы с git
    • Jedi - делает автодополнения для Python более умными и глубокими
    • SublimeREPL - запускает Read-eval-print loop в соседней вкладке, удобно для пошаговой отладки кода
    • Auto-PEP8 - приводит код в соответствие с каноном стиля pep8
    • Python Checker - проверка кода
  • PyCharm vs Sublime Text – a blog post comparing these two popular development tools and text editors.

  • PEP 0008 -- Style Guide for Python Code.

JavaScript-libraries for visualizing

R

LaTeX

📑 Open Datasets list

The initial list was provided by Kevyn Collins-Thomson from the University of Michigan School of Information.

Reddit

Social Networks (chanels, chats, groups, etc.)

What's is the difference between train, validation and test set, in neural networks?

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.

The validation data set is a set of data for the function you want to learn, which you are not directly using to train the network. You are training the network with a set of data which you call the training data set. If you are using gradient based algorithm to train the network then the error surface and the gradient at some point will completely depend on the training data set thus the training data set is being directly used to adjust the weights. To make sure you don't overfit the network you need to input the validation dataset to the network and check if the error is within some range. Because the validation set is not being using directly to adjust the weights of the network, therefore a good error for the validation and also the test set indicates that the network predicts well for the train set examples, also it is expected to perform well when new example are presented to the network which was not used in the training process.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

Also, in the case you do not have enough data for a validation set, you can use cross-validation to tune the parameters as well as estimate the test error.

Cross-validation set is used for model selection, for example, select the polynomial model with the least amount of errors for a given parameter set. The test set is then used to report the generalization error on the selected model.

Early stopping is a way to stop training. There are different variations available, the main outline is, both the train and the validation set errors are monitored, the train error decreases at each iteration (backpropagation and brothers) and at first the validation error decreases. The training is stopped at the moment the validation error starts to rise. The weight configuration at this point indicates a model, which predicts the training data well, as well as the data which is not seen by the network . But because the validation data actually affects the weight configuration indirectly to select the weight configuration. This is where the Test set comes in. This set of data is never used in the training process. Once a model is selected based on the validation set, the test set data is applied on the network model and the error for this set is found. This error is a representative of the error which we can expect from absolutely new data for the same problem.

⚙️ Models and Algorithms Implementation:

  1. k Nearest Neighbor

  2. Linear Regression

  3. Logistic Regression

  4. Fully Connected Neural Networks

    • Fully connected neural network that recognizes handwriting numbers from MNIST database (Modified National Institute of Standards and Technology database)
    • MNIST Database
    • Code
  5. Convolutional Neural Network (CNN)

  6. Gated Recurrent Units (GRU)

👩‍💻 Projects: