sejalv / python_spark_bigd_ml

Spark examples, personal projects with Python, Spark Streaming, Machine Learning, Spark DataFrames

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repo is based on examples/projects tried from the Udemy course: Spark and Python for Big Data with PySpark

Objectives:

  • Use Python and Spark together to analyze Big Data
  • Learning to setup Spark on local (Linux), Amazon Web Services EC2, and Databricks
  • Use Spark Streaming to Analyze Tweets in Real Time.
  • Learn to apply Linear Regression, Logistic Regression, Decision Trees, Random Forests, K-Means Clustering, Collaborative Filtering, NLP
  • Work on Consulting Projects that mimic real world situations, such as:
    • Classify Customer Churn with Logisitic Regression
    • Use Spark with Random Forests for Classification
    • Learn how to use Spark's Gradient Boosted Trees
    • Use Spark's MLlib to create Powerful Machine Learning Models
    • Create a Spam filter using Spark and Natural Language Processing.

About

Spark examples, personal projects with Python, Spark Streaming, Machine Learning, Spark DataFrames


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%