dbragdon1 / spark_projects

Some Projects Built with Apache Spark (pyspark specifically)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About

These are some pyspark demonstrations for NLP purposes.

The dataset used for each model is collected from Professor Julian McAuley's Amazon product dataset. This specific subset is titled "Cell Phones and Accessories".

Files and Directories

/models

Serialized form of trained pyspark models and pipelines

/metrics

Resulting metrics after training models

/classification

Contains a series of files demonstrating text classification with Apache Spark using Amazon product reviews.

/collaborative_filtering

Contains files for demonstrating collaborative filtering on text classification.

helper_functions.py

Contains helper functions for training models and loading data.

About

Some Projects Built with Apache Spark (pyspark specifically)


Languages

Language:Python 100.0%