Beast code in Giters

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Language:RMIT186900

teach-data-science-UCLA-master-appl-stats

Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA

Language:HTML13500

entity_resolution_spark

Collection of some algorithms for entity resolution

Language:HTML2800

sunny-side-up

Sentiment Analysis Challenge

Language:Jupyter NotebookNOASSERTION52100

DigAndBuried

挖坑与填坑

Language:GCC Machine Description69600

spark-cassandra-connector

DataStax Connector for Apache Spark to Apache Cassandra

Language:ScalaApache-2.0193700

datalib

JavaScript data utility library.

Language:JavaScriptBSD-3-Clause73200

info490-sp17

Advanced Data Science, University of Illinois Spring 2017

Language:Jupyter NotebookNOASSERTION5600

hdp-datascience-demo

HDP Data Science/Machine Learning demo

Language:HTML3700

mean

MEAN.JS - Full-Stack JavaScript Using MongoDB, Express, AngularJS, and Node.js -

Language:JavaScriptMIT487500

Tensorflow-MultiGPU-VAE-GAN

A single jupyter notebook multi gpu VAE-GAN example with latent space algebra and receptive field visualizations.

Language:Jupyter NotebookMIT43900

spark-workshop

Apache Spark™ and Scala Workshops

Language:HTMLApache-2.025800

mrec

A recommender systems development and evaluation package by Mendeley

Language:PythonNOASSERTION56200

UI-Flix

A movie recommandation website(course project)

Language:CSS400

movie-recommendation-system

A movie recommendation system given by user data, movie data and social data...

Language:JavaGPL-2.07800

bosch-kaggle-competition-spark

Bosch Kaggle competion: Reduce manufacturing failures (https://www.kaggle.com/c/bosch-production-line-performance)

Language:Python2400

tf-idf-spark-and-python

TF-IDF with Spark for the Kaggle popcorn competition

Language:Scala1000

CTR_Prediction

Click through rate prediction

2100

spark-kaggle

Spark in Kaggle competitions

Language:ScalaApache-2.0900

Spark_Linear_Regression

Spark (pyspark) linear regression on clickthrough rate (CTR) prediction form Kaggle

Language:Python800

Axa-Insurance-Telematics-Kaggle

I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.

Language:Jupyter Notebook1600

xiuxianxi

xiuxianxi's starred repositories

algorithm

hao

awesome-quant

deep_learning_notes

python_reference

ThinkBayes

handson-ml

MachineLearningTrick

tradeshift-text-classification

benchm-ml