teodoran / youtube-watch-history-converter

A command-line utility to convert watch history from YouTube to JSON Lines format.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

youtube-watch-history-converter

Convert watch history from YouTube to JSON Lines format. Part of a show-through that leads to this Datastudio report.

Exporting from Google Takeout

takeout.google.com

  1. Export YouTube data. Choose "Logg"/"History" and JSON as format.
  2. When export is done, download the results and find "watch-history.json"

Converting data to JSON Lines

github.com/teodoran/youtube-watch-history-converter

  1. Clone repo and build with dotnet build
  2. Then convert "watch-history.json" with the following.
$ HistoryConverter> dotnet run -- ../Exports/watch-history.json

Creating a new BigQuery dataset and table

console.cloud.google.com

  1. From the UI create a new dataset. Prefix it with CX-initials.
  2. Create a new table and upload the converted history.

Finding an outlier

From the BigQuery UI, hunt for some outliers. A good start is to group by title:

SELECT COUNT(Title), Title
FROM `computas-nxt-youtube-analyse.tae_youtube_data.views`
GROUP BY Title
ORDER BY COUNT(Title) DESC
LIMIT 100

You might identify two strange cases. Let's have a closer look:

SELECT *
FROM `computas-nxt-youtube-analyse.tae_youtube_data.views`
WHERE Title = 'Så en video som er fjernet'
-- WHERE Title = 'Så på https://www.youtube.com/watch?v=Uwo1KGDVSEk'
-- AND Id IS NOT NULL
LIMIT 100

Make a BigQuery view

Create a view based on the outlier findings that filters out unwanted views. You might end up with something along the lines of:

SELECT *
FROM `computas-nxt-youtube-analyse.tae_youtube_data.views`
WHERE Id IS NOT NULL
AND ChannelUrl IS NOT NULL

Enter Datastudio

datastudio.google.com

  1. From the Datastudio UI, create a new data source.
  2. Connect to BigQuery and the view you made.
  3. Create two custom properties: "No of Channels" COUNT_DISTINCT(ChannelUrl) and "No of Videos" COUNT_DISTINCT(Id)
  4. Now create a report. Explore different graphs and filters
  5. Try to copy the YouTube History report and update the data binding.

About

A command-line utility to convert watch history from YouTube to JSON Lines format.

License:MIT License


Languages

Language:C# 100.0%