youtube-watch-history-converter
Convert watch history from YouTube to JSON Lines format. Part of a show-through that leads to this Datastudio report.
Exporting from Google Takeout
- Export YouTube data. Choose "Logg"/"History" and JSON as format.
- When export is done, download the results and find "watch-history.json"
Converting data to JSON Lines
github.com/teodoran/youtube-watch-history-converter
- Clone repo and build with
dotnet build
- Then convert "watch-history.json" with the following.
$ HistoryConverter> dotnet run -- ../Exports/watch-history.json
Creating a new BigQuery dataset and table
- From the UI create a new dataset. Prefix it with CX-initials.
- Create a new table and upload the converted history.
Finding an outlier
From the BigQuery UI, hunt for some outliers. A good start is to group by title:
SELECT COUNT(Title), Title
FROM `computas-nxt-youtube-analyse.tae_youtube_data.views`
GROUP BY Title
ORDER BY COUNT(Title) DESC
LIMIT 100
You might identify two strange cases. Let's have a closer look:
SELECT *
FROM `computas-nxt-youtube-analyse.tae_youtube_data.views`
WHERE Title = 'Så en video som er fjernet'
-- WHERE Title = 'Så på https://www.youtube.com/watch?v=Uwo1KGDVSEk'
-- AND Id IS NOT NULL
LIMIT 100
Make a BigQuery view
Create a view based on the outlier findings that filters out unwanted views. You might end up with something along the lines of:
SELECT *
FROM `computas-nxt-youtube-analyse.tae_youtube_data.views`
WHERE Id IS NOT NULL
AND ChannelUrl IS NOT NULL
Enter Datastudio
- From the Datastudio UI, create a new data source.
- Connect to BigQuery and the view you made.
- Create two custom properties: "No of Channels"
COUNT_DISTINCT(ChannelUrl)
and "No of Videos"COUNT_DISTINCT(Id)
- Now create a report. Explore different graphs and filters
- Try to copy the YouTube History report and update the data binding.