bryanwhiting / generalconference

Scrape General Conference Talks using R

Home Page:https://bryanwhiting.github.io/generalconference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Are all the data being scraped?

bryanwhiting opened this issue · comments

https://www.churchofjesuschrist.org/study/general-conference/2020/10/45andersen.p5?lang=eng#p5
This paragraph is shortened in the data to just the stuff before the quote.

Same problem with this:
https://www.churchofjesuschrist.org/study/general-conference/2020/04/26renlund.p21?lang=eng#p21

I'm guessing this has to be due to the <sup> references being parsed.

Better thing to do would be to scrape the raw content and process it afterward. That way I don't have to rescrape all the content.