This repo is based on examples/projects tried from the Udemy course: Spark and Python for Big Data with PySpark
Objectives:
- Use Python and Spark together to analyze Big Data
- Learning to setup Spark on local (Linux), Amazon Web Services EC2, and Databricks
- Use Spark Streaming to Analyze Tweets in Real Time.
- Learn to apply Linear Regression, Logistic Regression, Decision Trees, Random Forests, K-Means Clustering, Collaborative Filtering, NLP
- Work on Consulting Projects that mimic real world situations, such as:
- Classify Customer Churn with Logisitic Regression
- Use Spark with Random Forests for Classification
- Learn how to use Spark's Gradient Boosted Trees
- Use Spark's MLlib to create Powerful Machine Learning Models
- Create a Spam filter using Spark and Natural Language Processing.