comp-journalism / apple-news-scraper

Code used for collecting and saving the Top Stories and Trending Stories in Apple News via Appium.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Great Work! Also: Appearance Duration in Aggregated Data

eric-langenberg opened this issue · comments

Great work on this project! It's a cool data set and an interesting project.

One small note on the aggregated data sets:

It looks like "appearance duration" is calculated by subtracting first_appearance from last_appearance. Depending on what you want to do with the data, that might not be the best definition, since stories sometimes pop out of and back into the trending data set. (This phenomenon seems to explain high outliers in the "appearance duration" column.)

Another way of measuring "duration" would be to calculate the duration of each row of the original data collected (which it looks like is usually every 5 minutes, but not always), and then sum that duration for each unique story. I suspect this would generally be a more meaningful way to measure duration.

But I don't think this metric was very important for the analyses you were running, and no big deal. Great work on this project.

Hey Eric, I just got around to seeing this note. You are absolutely right that the time subtraction is probably not ideal for all analyses, and I will be mindful of this in any future analyses. Thanks for pointing this out.