7125messi / learning-apache-spark

This repository contains apache spark tutorials implemented with pypsark. For some machine learning methods, there will be comparisons between pyspark and R results.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning apache spark

Ming Chen & Wenqiang Feng

Introduction

This repository contains mainly notes from learning Apache Spark by Ming Chen & Wenqiang Feng. We try to use the detailed demo code and examples to show how to use pyspark for big data mining. If you find your work wasn't cited in this note, please feel free to let us know.

Content

Acknowledgement

At here, we would like to thank Jian Sun and Zhongbo Li at the University of Tennessee at Knoxville for the valuable disscussion and thank the generous anonymous authors for providing the detailed solutions and source code on the internet. Without those help, this repository would not have been possible to be made. Wenqiang also would like to thank the Institute for Mathematics and Its Applications (IMA) at University of Minnesota, Twin Cities for support during his IMA Data Scientist Fellow visit.

Feedback and suggestions

Your comments and suggestions are highly appreciated. We are more than happy to receive corrections, suggestions or feedbacks through email (Ming Chen: mchen33@utk.edu, Wenqiang Feng: wfeng1@utk.edu) for improvements.

About

This repository contains apache spark tutorials implemented with pypsark. For some machine learning methods, there will be comparisons between pyspark and R results.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 96.3%Language:HTML 2.4%Language:PostScript 0.7%Language:Python 0.6%