xiuxianxi's starred repositories
awesome-quant
**的Quant相关资源索引
deep_learning_notes
deeplearningbook学习笔记,来自于 http://www.deeplearningbook.org
python_reference
Useful functions, tutorials, and other Python-related things
ThinkBayes
Code repository for Think Bayes.
handson-ml
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
MachineLearningTrick
Machine Learning Trick : GBDT_Feature Blending Stacking CascadeForest
tradeshift-text-classification
This is the 1st place solution of a kaggle machine contest: Tradeshift Text Classification. http://www.kaggle.com/c/tradeshift-text-classification
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
teach-data-science-UCLA-master-appl-stats
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
entity_resolution_spark
Collection of some algorithms for entity resolution
sunny-side-up
Sentiment Analysis Challenge
DigAndBuried
挖坑与填坑
spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra
info490-sp17
Advanced Data Science, University of Illinois Spring 2017
hdp-datascience-demo
HDP Data Science/Machine Learning demo
Tensorflow-MultiGPU-VAE-GAN
A single jupyter notebook multi gpu VAE-GAN example with latent space algebra and receptive field visualizations.
spark-workshop
Apache Spark™ and Scala Workshops
movie-recommendation-system
A movie recommendation system given by user data, movie data and social data...
bosch-kaggle-competition-spark
Bosch Kaggle competion: Reduce manufacturing failures (https://www.kaggle.com/c/bosch-production-line-performance)
tf-idf-spark-and-python
TF-IDF with Spark for the Kaggle popcorn competition
CTR_Prediction
Click through rate prediction
spark-kaggle
Spark in Kaggle competitions
Spark_Linear_Regression
Spark (pyspark) linear regression on clickthrough rate (CTR) prediction form Kaggle
Axa-Insurance-Telematics-Kaggle
I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.