yvgupta03 / Big_Data_Project_US-Airlines_Tweet_Processing_and_Analysis

Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.

Repository from Github https://github.comyvgupta03/Big_Data_Project_US-Airlines_Tweet_Processing_and_AnalysisRepository from Github https://github.comyvgupta03/Big_Data_Project_US-Airlines_Tweet_Processing_and_Analysis

Big_Data_Project_US-Airlines_Tweet_Processing_and_Analysis

Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.

Input and output files used have been attached in the repositories, the urls used are only for the sake of usage in the databricks cluster.

About

Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.


Languages

Language:Jupyter Notebook 100.0%