Shan Dou's repositories
MLND_capstone
Capstone project implementation, report, and proposal for Udacity Machine Learning Engineer Nanodegree
blog-binary-classification-metrics
Codebase for the blog post "24 Evaluation Metrics for Binary Classification (And When to Use Them)"
Categorical_similarity_measures
Library for python community to find the similarity or distance between two entities containing categorical data
Deep-Semantic-Similarity-Model-PyTorch
My PyTorch implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.
Facial-Similarity-with-Siamese-Networks-in-Pytorch
Implementing Siamese networks with a contrastive loss for similarity learning
feature-engineering-book
Code repo for the book "Feature Engineering for Machine Learning," by Alice Zheng and Amanda Casari, O'Reilly 2018
gensim-data
Data repository for pretrained NLP models and NLP corpora.
kaggle-HomeDepot
3rd Place Solution for HomeDepot Product Search Results Relevance Competition on Kaggle.
LSTM-siamese
Siamese-LSTM PyTorch Implementation for cikm 2018
numerical-linear-algebra
Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
preDict
Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts
probability_cheatsheet
A comprehensive 10-page probability cheatsheet that covers a semester's worth of introduction to probability.
pydata_nyc2018-intro-to-model-interpretability
Notebook and slides for my talk at Pydata NYC 2018
python-cheat-sheets
IPython notebooks demonstrating useful Python code snippets and functionality
pytorch-examples
Starting with deep learning and PyTorch
query-segmenter
Query Segmentation for search
scikit-hts-examples
Example usage of scikit-hts
statistics-in-R-data-sets
Data sets from book; also available on Sage website
tidytuesday
Official repo for the #tidytuesday project
XGBoost-lambdaMART
Running LambdaMART using XGBoost