WLun001 / youtube-video-analysis

YouTube video analysis based on datasets on Kaggle

Home Page:https://www.kaggle.com/datasnaek/youtube-new

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Youtube Video Analysis

YouTube video analysis based on datasets on Kaggle

How to run

If havent start spark-shell

spark-shell -i file.scala

If started spark-shell

:load file.scala

File explanation

  • setup.scala - initial setup. Read from csv and clean date
  • saveToParquet.scala - save RDD to Parquet. Assume Parquet is created with Hive.
    CREATE EXTERNAL TABLE videos(video_id STRING, trending_date STRING, title STRING, channel_title STRING, category_id STRING, publish_time STRING, tags STRING, views INT, likes INT, dislikes INT, comment_count INT, thumbnail_link STRING, comments_disabled BOOLEAN, ratings_disabled BOOLEAN, video_error_removed BOOLEAN, description STRING) STORED AS PARQUET LOCATION '/user/cloudera/labs';
  • readFromParquet.scala - read from Parquet after saved
  • trending.scala - Video Trending analysis

About

YouTube video analysis based on datasets on Kaggle

https://www.kaggle.com/datasnaek/youtube-new


Languages

Language:Scala 100.0%