J'ai seulement fait ici un amas de fleurs étrangères, n'y ayant fourni du mien que le filet à les lier.

My Machine Learning related stuff!

My Apache Zeppelin and Jupyter notebooks, and more! for a series of useful data analysis and machine learning related stuff in general

ML Resources(with an emphasis on Python)

This document is an attempt to come up with a curated list of Machine Learning resources, including books, papers, software, libraries, notebooks, etc. Most of the libraries are for Python though the rest of the materials here are generally suited for working with data.

Books and Writings

Foundations of Machine Learning: I strongly suggest reading this book
Readings in Database Systems(The Red Book): I strongly suggest reading this book
A Course in Machine Learning: Good book to start learning ML
Mining Massive Datasets: Great book about Big Data concepts, Data Mining algorithms and their applications
Networks, Crowds, and Markets: Reasoning About a Highly Connected World : Good starter book to Network Science and its applications(e.g. graph analysis, social network analysis)
An Introduction to Statistical Learning
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Arxiv.org/ML
Python Machine Learning
Python Data Science Handbook
Whirlwind Tour Of Python: Good starter book for learning Python
Python Machine Learning (second edition)
Deep Learning Book(MIT Press)
Probability and Statistics Cookbook
An ML Cheat Sheet
Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists
Deep Learning Papernotes: A repository of many of the research papers published about various DL related topics over the years
NLR Papers: A perfect collection of papers on Network Representation Learning and Network Embedding
KRL Papers: A nice collection of papers on Knowledge Representation Learning and Knowledge Embedding
Stanford CS 229 ML Cheatsheets: A nice collection of ML cheat-sheets on various important subject matters
Machine Learning for Business: Machine Learning for Business teaches you how to make your company more automated, productive, and competitive by mastering practical, implementable machine learning techniques and tools
A gentle introduction to Tensors and their uses: An introduction to Tensors and their sample applications
Linear Algebra course book: Jim Hefferon's Linear Algebra book, A good companion book for learning linear algebra fundamentals
Top 10 Data Mining Algorithms: A good article describing how 10 of more famous Data Mining algorithms work
Representation Learning: A Review and New Perspectives: A good introduction to Representation Learning and its implications
NLTK Book: A great book if you want to process and analyse texts with NLTK
An Introduction to Variable and Feature Selection
Machine Learning Workflow with Python: A collection of useful ML related stuff for people interested in working with the data
Interpretable Machine Learning
GNNPapers: A collection of research papers on Graph Neural Networks
Mining Social Media: The web version of an easy to follow introductory book for mine social media data
The Economist data visualisation: A set of articles describing how the Economist uses data visualisation

Dataset Repositories

UCI Machine Learning Repository: Lots of interesting datasets, piled up just for you to use them!
Kaggle: A very active community, a great place to learn from others
Network Repository: Many network/graph datasets, If you like graphs, it's the place for you!
Deep Learning Datasets: DL datasets of course!
MLDatasets: Another nice dataset repository
Open Data for Deep Learning: Deep means big here I guess!
Wikipedia List of Datasets for Machine Learning Research: It's Wikipedia!
GHTorrent: GHTorrent project is an attempt to make an offline queryable mirror of Github projects' data available for everyone
SOTorrent: A very rich dataset of StackOverflow posts and related contents such as post comments
Datalist.com: A handy list of ML related datasets from all over the web
awesome-twitter-data: A huge collection of datasets from Twitter's data
Dataset for Graph classification: A collection of datasets for classification on graphs

Q&A Websites

Quora Data Science: A superb place to ask and seek answers!
Stack Exchange Data Science: Another friendly Q&A community with an emphasis on the technical side
Kaggle: Kaggle again:)
Quora Machine Learning: Quora again:)
Stack Overflow: General Q&A for developers, need help with your code then it's the place

Useful Websites

Kaggle: Kaggle again:)
Reddit Machine Learning Community
CrowdAI: A Kaggle alternative, popular among students
Quora: Quora Q&A platform
Github.com: Github contains many useful resources such as the code for many algorithms, all in one centralised platform!
Apache Projects: A few hundred cool software projects, many are related to data management in some way! (e.g. Hadoop ecosystem)
Stanford Machine Learning Course(Have a look at the project section!)
NIPS Website: A very prestigious AI conference held every year
Scipy Lectures
Nice website about Data Mining
ML Resources on Github
A list of researches on a few interesting topics
Open Machine Learning Course: An ML course covering so many topics
Tanagra - Data Mining and Data Science Tutorials: TANAGRA's tutorials, covering a vast amount of topics
Papers with code: It is a convenient repository of research papers that are coming with their code published too, you can access the code from many recent cutting-edge algorithms from here

Twitter datasets: A list of datasets related to social platform Twitter

Editors & IDEs for Python

Spyder: A great Python IDE for scientists in general
Pycharm CE: An excellent IDE for development of anything with Python
GNU Emacs: GNU Emacs is an environment for doing almost anything
IDLE: Default Python IDE, lean and clean environment to develop in Python
Rodeo: A Python IDE for data scientists

Toolboxes & Distributions

Anaconda: A very user friendly environment for scientific Python development
Miniconda
Vowpal Wabbit
StackNet
Sofia ML
LIBLINEAR
LibFM
SVM Rank

Notebook Authoring Environments

Jupyter
Apache Zeppelin: A great notebook environment for data visualization and doing analytics stuff, it can connect to many different databases and data management systems
Beacker
nteract
JupyterLab: Next-generation Jupiter notebook environment
Spark Notebook: Spark Notebook is an interactive notebook authoring environment for working with Scala code on top of Spark clusters
Python(x,y): Python(x,y) is an open-source environment for scientific and numerical computations and analysis
Polynote: A notebook authoring tool with native support for Scala on Spark from Netflix

Python Machine Learning, Data Mining, Statistical Analysis Libraries

Pandas: Famous Python's data manipulation library
Scipy: Defacto Pythons scientific computation library
Numpy: Linear algebra library for fast numerical computation
Scikit Learn: High-level Machine Learning library with tons of features, very easy-to-use and extendable
Bokeh: An interactive high-level data visualization library
Matplotlib: A compelling data visualisation library, More low-level than other visualisation libs
Graph Tool: A fast and powerful library for working with graphs in Python, It's developed on top of Boost C++ libraries so consequently it's very efficient
NetworkX: A Python module for Complex Network modelling and analysis, Very easy-to-use but may be slow on times because it's in pure Python
TensorFlow: Low-level library for creating deep artificial neural networks, works both on CPU and GPU. Usually, you use TF in conjunction with a library with higher-level API exposing TF's functionalities like Keras
Keras: "Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano" - Keras's website
NLTK: Swiss Army knife tool for text processing in Python
Pattern: Another good text processing library for Python
IPython
Orange: Orange is a general-purpose data mining and analysis tool also library that lets you develop machine learning pipelines just by a few dragging and dropping
Theano
CatBoost: Yandex's implementation of Gradient Boosting on Decision Trees, It supports categorical features out of the box
XGboost: Original XGBOOST library, A very efficient Gradient Boosting library with extra regularisation
Mlxtend: A great Data Mining and Machine Learning library with
NetworKit: A very high-performance graph processing and analysis toolkit, written in C++ and uses OpenMP, so it is very fast on multicore computers
Eli5
Pandasql
Dask: A fast data manipulation library with out-of-core handling of the data, Suited for a distributed environment, Its API is (exactly)compatible with Pandas' API
MLBox
Gensim
Scikit-learn-Contrib/Imbalanced-learn: An extension library for Scikit-learn for handling imbalanced datasets
Patsy: "Kamelot!!! ... It's just a model Shhhh!"
Statsmodels: A Python package for building various statistical models
Seaborn: A high-level visualization library for Python
Pandas-profiling
Blaze
Altair
Numba
BigARTM
GYM: An open-source toolkit for reinforcement learning from Open AI project
PyBrain: A Machine Learning library for Python with emphasis on modelling via many types of neural network architectures
Sklearn-pandas
Auto-ML
Scikit-Learn Contrib/Lightning: An extension library to Scikit-learn for large-scale linear classification, regression and ranking problems
GPLearn
Nengo
Scikit-learn Contrib/*: A collection of extension libraries for Scikit-learn adding new (missing) functionalities to it
Koolmogorov: A Python library for hierarchical clustering and visualisation
Lime: A tool for exploring and explaining the output of classifiers
TreeInterpreter
SNAP-Python: Python wrapper library for Stanford Network Analysis Platform (SNAP)
Pycobra: A Python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tessellations
TF Learn: A library on top of TensorFlow providing a higher API than TensorFlow
Featuretools: A Python library for automated feature engineering
spaCy: NLP library with tons of features(like various CNN models)
SymPy: Symbolic computation library for Python, Aiming to become a full-fledged CAS
Uniform Manifold Approximation and Projection: A general non-linear dimensionality reduction algorithm implemented in Python
Scikit-learn Contrib/HDBSCAN: A high-performance implementation of HDBSCAN clustering, HDBSCAN is robust and easy-to-use clustering algorithm with minimal parameters, Ideal for exploratory data analysis; It works as an extension to Scikit-learn
Turi Create: A fast tool/library for simplifying various ML tasks
Scikit-learn-Contrib/Categorical-Encoding: An extension library for Scikit-learn that provides additional categorical feature encoding schemes(e.g. LeaveOneOut scheme)
Optunity: A library for hyperparameter optimization
Kmodes
TF-Slim
Pyro: "Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend" - Pyro's website
GEM: A Python library that provides various graph embedding methods like 'node2vec' and 'locally linear embedding.'
DynamicGEM: A dynamic graph embedding library like GEM
GraphSAGE: A graph embedding framework to generate low-dimensional vector representations for nodes, instrumental if you need to use deep learning on graph data
Horovd: A distributed training framework for TensorFlow, Keras, and PyTorch by Uber
NetLSD: Python implementation of NetLSD, a scalable graph embedding algorithm for representing a graph via a low-dimensional vector

SHAP: A tool for exploring and explaining the outcome of an arbitrary model
NLPre: Another cool Python NLP library
GCN: Python implementation of graph convolutional networks in TensorFlow
AllenNLP: "An open-source NLP research library, built on PyTorch" - AllenNLP's repository documentations
TensorLy: A Python Library for efficient Tensor operations
CuPy: A Python matrix library accelerated by Nvidia CUDA, it's also compatible with Numpy's API
Scikit-Multiflow: A Python library for Stream Mining
MLflow: A software toolbox to manage ML projects' workflow and life-cycle, it aims to make ML software projects easier to implement by providing various helper components for each step
pyGAM: A Python module for building Generalized Additive Models (GAMs)
ggplot: "ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is built for making professionally looking, plots quickly with minimal code" - ggplot's website
Linkpred: A Python package for link prediction on graphs
SparklingGraph: A Python library to process large scale graphs using Spark and GraphX in a distributed manner
OpenNE: An opensource network embedding library
Galry: A high-performance visualisation library in Python
Dedupe: A Python library for fuzzy entity-resolution and record deduplication
PyText: A deep-learning-based NLP modelling framework built on top of PyTorch
flair: A state-of-the-art NLP framework in Python from Zalando
NearPy: "A Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes" according to its descriptions
fastchunking: A (fast) text chunking algorithm implemented in C++ and Python
Vaex: Vaex is a data manipulation library much like Pandas and Dask with a lazy out-of-core approach to handling the data so you can work with huge tables with it
openTSNE: An extensible, parallel implementation of t-SNE
Faust: A stream processing library for Python
Active Semi-Supervised Clustering: An extension library for scikit-learn that implements a collection of useful active semi-supervised clustering algorithms
TextDistance: A Python library for calculating and comparing the distance between two sequences (such as text documents) with many algorithms
Ray: A scalable. high-performance distributed execution framework for executing arbitrary Python functions on multiple machines, suitable for many ML workloads
Pyitlib: An opensource library for calculating a useful collection of information-theoretic measures (i.e. Entropy) for discrete random variables
KDEpy: A collection of useful kernel density estimators in Python 3.5+
Tsfresh: A Python library for (automatic) feature extraction and engineering on time-dependent data
GPy: A Python library for working with Gaussian processes
Tslearn: A machine learning library dedicated to working with time-dependent data
Ludwig: "Ludwig is a toolbox that allows to train and test deep learning models without the need to write code" - Ludwigs's website
Record Linkage Toolkit: A Python software toolkit for record deduplication and linkage
PyJanitor: Python port of R's janitor package, for data cleansing and manipulation
FastText: A library for fast and efficient text embedding and classification
Mimesis: A fast and useful fake data generation library
PyOD: A Python software toolbox for scalable Outlier Detection (aka Anomaly Detection)
Creme: A Python library for Online Learning and building incremental models
vg: A linear algebra library much like Numpy with a more human-friendly interface
GraphKernels: A fast library for calculating various graph kernels
GraKeL: A graph kernel calculation library that is using scikit-learn's API so it can be used with other functionalities and routines already present in scikit-learn without much hassle
Graphsim: A graph similarity extension libraries for NetworkX
Textract: A general text extraction tool from many file formats
Sacred: Sacred is a Python library to make an ML workflow easier to reproduce and manage for you!
TextDistance: TextDistance is a Python library for calculating and comparing the distance between two or more sequences of an arbitrary alphabet (e.g., words, DNA sequences), it has got over 30 distance algorithms to use
Py_stringmathcing: Py_stringmathcing is a Python library consisting of a comprehensive set of string tokenisers (such as alphabetical tokenisers, whitespace tokenisers) and also string similarity measures (e.g., edit distance, Jaccard distance)
JGraph: JGraph is a WebGL graph drawing library for Python
Kedro: A Python library and also tool to manage your data analysis workflow in your projects
PySAL: PySAL is a Python package for geolocation-based data analysis
k-Shape: This is a Python implementation of the k-Shape clustering algorithm for clustering the time series data
Pyforest: You could use Pyforest to import all Python data science-related library lazily as you need them in your code
ETE Toolkit: ETE Toolkit is a Python toolbox for visualising and analysis of tree format data
Whoosh: Whoosh is a full-text indexing and search library for Python
Geoplot: Geoplot is a Python visualisation library for geospatial plotting of geo-locational records
GeoPandas: GeoPandas is a high-level library with an API similar to Pandas that makes working with geospatial datasets in Python mush easier
Edward: "A library for probabilistic modelling, inference, and criticism" - its website
HyperTools: A Python library for high-dimensional data visualisation and analysis
TextRank: TextRank algorithm implementation for Python 3
pymorton: A Python package for ordinal hashing of multidimensional points into a one-dimensional ordering
PySS3: A Python package implementing SS3 text classifier with visualisations tools for explainable artificial intelligence (XAI)
Lpproj: A Python implementation of Locality Preserving Projections (LPP) with Scikit-Learn compatible API
Multi-Rake: Multilingual rapid automatic keyword extraction (Multi-RAKE) is a Python library for automatic text summarisation and keyword extraction of text in many different languages
PyCarets

Additional Useful Resources

PyPy Python Implementation: A stackless alternative implementation for Python's runtime
Useful Metrics: A collection of useful ML related scoring and learning metrics
XGboost Benchmarks
Franchise Notebook
Orange
Weka: The famous Data Mining tool from where Kiwis live
ELKI: A Data Mining software framework in Java
Julia Programming Language: New language for Scientific Computing and HPC
SQL Notebook
IPython: An augmented Python shell with lots of features
Incanter: A statistical analysis environment for a Lisp(for Clojure to be exact)
Torch: Scientific Computing framework running on top of Lua's Just in Time compiler, brilliant idea!
BPython: An advanced Python shell
RAnalyticFlow: Great environment for Data Flow Programming in R
SPMF: A Java Data Mining library with tons of cool algorithms
SageMath: Open source math software system, a complete math environment for everyone
H2O AI Platform: A software tool for Big Data Analysis, could be used for both Data Mining or Machine Learning tasks, It has tons of features
Various ML Cheat Sheets
OpenRefine: An open-source data cleansing and refinement tool
Deep Learning Papers
Apache Mxnet: A high performance and scalable ANN framework for Deep Learning
Material for the book 'Python for Data Analysis'
Encog Machine Learning Framework: An ML library for Java and .NET with focus on ANN algorithms
Apache Spark MLib: An ML library on top of your spark cluster!
Awesome-Python: A comprehensive list of Pythonic resources (libraries, frameworks, etc.)
GATE: A mature text processing toolkit in Java
MALLET: "MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modelling, information extraction, and other machine learning applications to text." - MALLET's website
MLPack: A fast ML library written in C++ with bindings to Python
t-SNE: Implementation of famous t-distributed stochastic neighbour embedding algorithm for various languages
Caffe
Apache Singa
CompLearn
SNAP
Apache PredictionIO
JGraphT: A Java library for working with graphs with tonnes of features
JGaphX: A Java library for diagramming and visualising graphs
Microsoft Distributed Machine Learning Toolkit
Microsoft Cognitive Toolkit
BIDMat: A both CPU and GPU-accelerated matrix library for data mining tasks
BIDMach
Apache SystemML
Apache Mahout
Accord.NET: Accord.NET is a Machine Learning framework written in C#, its API is available for .NET, it also comes combined with some audio and image processing libraries completely written in C#
BitMAGIC Library
Cassovary
Dex: A nice Java-based tool for Data Analysis and Data Mining
Apache OpenNLP
OpenNN: A C++ library to build complex neural network models
MOA: A tool for mining stream data, by people who also created Weka
MLPACK: C++ Machine Learning library for scalability, speed, and ease-of-use
MOSES: "Moses is a statistical machine translation system that allows you to train translation models for any language pair automatically." - Moses's website
Parallel Python: A Python module for parallel execution of code on SMP and Cluster environment
BeautifulSoup: A handy Python library to digest almost anything from World Wild Web
Wordbatch: A library for parallel feature extraction on textual data(and potentially other complex data types)
Mypy: Static typing facilities for Python
SKIL: A platform for managing the life cycle of an ML/DS related project or product
An unofficial Python extension package repository for Windows
LIBOL: An online learning library
Smile: "Smile is a fast and comprehensive machine learning system"- Smile's website
Tablesaw: A daydreamer and visualisation library for Java
TensorFlow Models: A repository of models and examples built with TensorFlow
Curated list of graph embedding methods: A collection of paper-code pairs for the state of the art graph embeddings(a.k.a network representational learning) algorithms
Curated list of resources for Recommender Systems
Pegasus: An open-source system for analysing huge graphs, It seems it is not being developed or maintained for a long time
Dataset: A handy tool to simplify the task of reading and writing to relational databases
Twython: A Twitter API library in pure Python with tonnes of features
Apache TinkerPop: A cool graph storage and computation framework, it can be used both as a graph analytics platform and a graph database system, love the little gremlins!
Graphexp: Graphexp is a visual graph explorer with D3.js for TinkerPop
Scilab: An open-source numerical computation language and environment, great Matlab alternative
Glow: A compiler for Neural Network hardware accelerators for various hardware
GraphJet: A real-time graph processing library in Java
GraphDrawing: A very nice graph analysis and drawing library in Java
Sketch Library: A C++ library for data summarization
The Lemur Project: A collection of search engine, text processing and Data Mining tools and libraries in C++ and Java-like RankLib for ranking
VisPy: A Python library for interactive scientific visualisation that is designed to be fast, scalable and easy to use
Awesome Machine Learning: A curated list of awesome Machine Learning frameworks, libraries and software, etc
MOA Framework: A fantastic Java software environment and framework for Stream Mining
MEKA: A multi-label classification tool, it works on top of Weka
Mulan: A Java library for learning on multi-label data
Dlib: A fast Machine Learning library implemented in C++ for solving real-world data problems
MITE: A library and tool for information extraction on text data, it's built on top of Dlib with binding for languages like Java and Python
GraphStream: GraphStream is a Java library for analysing and visualising dynamic graphs
Cytoscape: A complex network (graph) visualization tool in Java
Gephi: A network visualisation and analysis tool in Java
SocNetV: A handy social network visualisation tool
Visone: Yet another handy social network analysis and visualisation tool
Flashlight: A fast Machine Learning library in C++
Machine Learning with Python: A collection of ML algorithms and their sample use-cases implemented in Python
TANAGRA: "TANAGRA is a free DATA MINING software for academic and research purposes" its website
KNIME: KNIME is an open-source data analytics, reporting and data integration platform
MG4J: An open-source, high-performance full-text search engine written in Java
WebGraph: A Java framework for working on huge graphs
RTree: Reactive implementations of immutable in-memory R-tree and R*-tree in Java
Recommender Systems: A useful repository of stuff all about the Recommender Systems (e.g. best practices to build Recommender Systems)
Awesome-Graph: A curated list of resources (e.g., libraries, frameworks and databases) related to graphs
Parallel Graph AnalytiX (PGX): A graph processing and analytics toolbox from Oracle which is written in Java
ROOT: A scientific toolbox for data processing and analysis in C++
Stanford Topic Modeling Toolbox (TMT): TMT is a nice Java toolkit for topic modelling on textual data
Java Data Mining Package: An opensource Java package for mining massive datasets implementing a vast collection of algorithms (i.e. clustering, regression, classification and graphical models)
ScalaNLP: A numerical computation and Data Mining library suite written in Scala, with an emphasis on NLP
Vegas: A very flexible declarative data visualisation library in Scala that works with Apache Spark right out of the box
DeepLearning.scala: A simple Scala library for creating complex artificial neural networks by ThoughtWorks
XAPIAN: An opensource search engine library with bindings to be used in many high-level programming languages, for example, Python, Java, and Lua!
DataMelt: "DataMelt is a free software for numeric computation, mathematics, statistics, symbolic calculations, data analysis and data visualisation" - DataMelt's website
Luna: A functional programming language to create data processing friendly programs in a WYSIWYG way
NetLogo: A computational multi-agent development and simulation environment, very cool tool for investigating complex phenomena via implementing simple computational rules for agents!
LabPlot: LabPlot is a lovely application for data analysis and plotting, it is part of KDE Project!
Meta Toolkit: A fast software toolkit implementing many useful ML algorithms, it is written in C++
Record Linkage Tools: A collection of useful resources for record deduplication and linkage
Gunrock: A GPU based graph analytics and processing library, it works with CUDA
Papers on Graph Analytics: A thorough list of publications related to graphs covering many interesting topics
GraphIt: GraphIt - "A High-Performance Domain Specific Language for Graph Analytics" - GraphIt's website
SMORe: A handy tool and library for fast weighted graph embedding in C++
Warp-ctc: A fast parallel implementation of CTC, for both CPU and GPU
Grew: Grew is a graph library and tool written in Ocaml with applications in NLP, it is a companion tool for the book Application of Graph Rewriting to Natural Language Processing
ZVTM: A handy graph visualisation library for Java
mrJob: A Python library to create MapReduce jobs and run them on multiple machines (i.e., in a cluster)
Metanome: A collection of interesting materials (e.g., algorithms, code, articles) related to data profiling
Graphillion: Graphilion is a software library for working with many graphs in a parallel fashion
Awesome graph classification: A very thorough collection of graph embedding, classification and representation learning papers with the code!
VFML: Very Fast ML (hence the name VFML) is a fast C library for mining very huge data streams
Talisman: Talisman is a modular JavaScript library for NLP and Machine Learning activities
StyleGAN: StyleGAN is TensorFlow implementation of a proposed architecture for GANs from NVIDIA, you can use it to create photo-realistic pictures of people who don't exist!
Java String Similarity: A Java library implementing a collection of useful text similarity/distance measures
Label Studio: Label Studio is a handy tool with a nice UI for labelling your data (e.g., records and documents)
GraphML: GraphML is a graph representation and serialisation file format based on XML that could store many different types of graphs with their attributes without loss of information
Taco: A compiler for compiling and executing general tensor algebra operations on sparse tensors in machine code for CPUs and GPUs
Libspatialindex: Libspatialindex contains many robust geolocational indexing algorithms like R*-tree and TPR-tree
NLP Best Practices: A collection of best practices and their examples in NLP domain from Microsoft
Tulip: Tulip is a nice open-source data visualisation and analysis software toolbox, it is especially good for working with graphs and graph datasets
Juno: Juno is an IDE based on Atom for Julia programming language
BoofCV: A real-time machine vision and image processing in Java
cuDF: cuDF is a library with API similar to Pandas that is built based on the Apache Arrow columnar memory format, cuDF uses GPU routines for loading, joining, aggregating, filtering, and otherwise manipulating data
LASER toolkit: LASER (Language-Agnostic SEntence Representations) is a software toolkit for sentence embedding for about 100 different languages
Idyll: "A toolkit for creating data-driven stories and explorable explanations" - Idyll's website
DeepLearning4J: A java-based software toolbox for building and training deep artificial neural networks
NeMo: NeMo is a software toolkit for building AI applications
TRAINS Agent: TRAINS Agent is a DevOps tool for setting up and running an AI experiment on a cluster computing environment
TensorFlow Hub: TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of deep learning models
AIX360: An explainable AI (XAI) toolkit to interpret Machine Learning models
Catalyst: Catalyst is a tool for making Deep Learning experiments on PyTorch reproducible
TensorFlowJS: TensorFlowJS is a JavaScript library to use TensorFlow models in web applications in the browser
Kst: Kst is a handy data visualisation tool from KDE project
AMIDST: AMIDST is a Java software toolbox for probabilistic modelling of data
LIBFFM: "LIBFFM is an open-source tool for field-aware factorisation machines (FFM)"; people won a few real-world data science challenges in Kaggle
jLDADMM: A Java package for LDA and DMM topic modelling

My Favourites

Ali Rahimi's talk NIPS 2017: Good talk from someone inside the field
Procrustes: How could we live without Wikipedia?
Probably Approximately Correct
Foundations of Machine Learning: A good book to start learning ML, A must for every ML enthusiast
Scikit-Learn website: Scikit-learn's website itself is a great resource to learn!
What Computers Still Can't Do: Some old and still valid criticisms of Strong AI!, Are AI and Alchemy the same?
Readings in Database Systems(The Red Book): An enjoyable to read, It's a little bit hard to follow at first for me, but a great many resources are mentioned at the end of each chapter, and it gives great insights about the history, trends and future of DBMSs and Data Processing Platforms
Kolmogorov Complexity: Let's compress everything!
Machine Learning Meets Databases: A very informative and also easy to follow article, including a short introduction to Machine Learning and also describing its relation to Data Mining and Databases
A gentle introduction to Tensors and their uses: An introduction to Tensors and their sample applications, Don't let the math scare you off!:0)
Mining Massive Datasets: A very nice blend of theory and application for what can be done to data
Networks, Crowds, and Markets: Reasoning About a Highly Connected World : Very insightful if like to know more about the interconnected world and networks

junjiez / PracticalMachineLearning