xiuxianxi

xiuxianxi

Geek Repo

Github PK Tool:Github PK Tool

xiuxianxi's starred repositories

Language:PythonStargazers:3294Issues:0Issues:0

hao

好东西传送门

Stargazers:1398Issues:0Issues:0

awesome-quant

**的Quant相关资源索引

License:MITStargazers:4001Issues:0Issues:0

deep_learning_notes

deeplearningbook学习笔记,来自于 http://www.deeplearningbook.org

License:GPL-3.0Stargazers:266Issues:0Issues:0

python_reference

Useful functions, tutorials, and other Python-related things

Language:Jupyter NotebookStargazers:3754Issues:0Issues:0

ThinkBayes

Code repository for Think Bayes.

Language:TeXStargazers:1646Issues:0Issues:0

handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:25147Issues:0Issues:0

MachineLearningTrick

Machine Learning Trick : GBDT_Feature Blending Stacking CascadeForest

Language:PythonStargazers:368Issues:0Issues:0

tradeshift-text-classification

This is the 1st place solution of a kaggle machine contest: Tradeshift Text Classification. http://www.kaggle.com/c/tradeshift-text-classification

Language:PythonStargazers:150Issues:0Issues:0

benchm-ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Language:RLicense:MITStargazers:1869Issues:0Issues:0

teach-data-science-UCLA-master-appl-stats

Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA

Language:HTMLStargazers:135Issues:0Issues:0

entity_resolution_spark

Collection of some algorithms for entity resolution

Language:HTMLStargazers:28Issues:0Issues:0

sunny-side-up

Sentiment Analysis Challenge

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:521Issues:0Issues:0

DigAndBuried

挖坑与填坑

Language:GCC Machine DescriptionStargazers:696Issues:0Issues:0

spark-cassandra-connector

DataStax Connector for Apache Spark to Apache Cassandra

Language:ScalaLicense:Apache-2.0Stargazers:1937Issues:0Issues:0

datalib

JavaScript data utility library.

Language:JavaScriptLicense:BSD-3-ClauseStargazers:732Issues:0Issues:0

info490-sp17

Advanced Data Science, University of Illinois Spring 2017

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:56Issues:0Issues:0

hdp-datascience-demo

HDP Data Science/Machine Learning demo

Language:HTMLStargazers:37Issues:0Issues:0

mean

MEAN.JS - Full-Stack JavaScript Using MongoDB, Express, AngularJS, and Node.js -

Language:JavaScriptLicense:MITStargazers:4875Issues:0Issues:0

Tensorflow-MultiGPU-VAE-GAN

A single jupyter notebook multi gpu VAE-GAN example with latent space algebra and receptive field visualizations.

Language:Jupyter NotebookLicense:MITStargazers:439Issues:0Issues:0

spark-workshop

Apache Spark™ and Scala Workshops

Language:HTMLLicense:Apache-2.0Stargazers:258Issues:0Issues:0

mrec

A recommender systems development and evaluation package by Mendeley

Language:PythonLicense:NOASSERTIONStargazers:562Issues:0Issues:0

UI-Flix

A movie recommandation website(course project)

Language:CSSStargazers:4Issues:0Issues:0

movie-recommendation-system

A movie recommendation system given by user data, movie data and social data...

Language:JavaLicense:GPL-2.0Stargazers:78Issues:0Issues:0

bosch-kaggle-competition-spark

Bosch Kaggle competion: Reduce manufacturing failures (https://www.kaggle.com/c/bosch-production-line-performance)

Language:PythonStargazers:24Issues:0Issues:0

tf-idf-spark-and-python

TF-IDF with Spark for the Kaggle popcorn competition

Language:ScalaStargazers:10Issues:0Issues:0

CTR_Prediction

Click through rate prediction

Stargazers:21Issues:0Issues:0

spark-kaggle

Spark in Kaggle competitions

Language:ScalaLicense:Apache-2.0Stargazers:9Issues:0Issues:0

Spark_Linear_Regression

Spark (pyspark) linear regression on clickthrough rate (CTR) prediction form Kaggle

Language:PythonStargazers:8Issues:0Issues:0

Axa-Insurance-Telematics-Kaggle

I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.

Language:Jupyter NotebookStargazers:16Issues:0Issues:0