bryanwhiting / generalconference

Scrape General Conference Talks using R

https://bryanwhiting.github.io/generalconference

Are all the data being scraped?

bryanwhiting opened this issue 3 years ago · comments

Bryan Whiting commented 3 years ago

https://www.churchofjesuschrist.org/study/general-conference/2020/10/45andersen.p5?lang=eng#p5
This paragraph is shortened in the data to just the stuff before the quote.

Same problem with this:
https://www.churchofjesuschrist.org/study/general-conference/2020/04/26renlund.p21?lang=eng#p21

I'm guessing this has to be due to the <sup> references being parsed.

Better thing to do would be to scrape the raw content and process it afterward. That way I don't have to rescrape all the content.

Bryan Whiting commented 3 years ago

Also truncated:

https://www.churchofjesuschrist.org/study/general-conference/2021/04/53dyches.p18?lang=eng#p18

Bryan Whiting commented 3 years ago

But this verse was full:

https://www.churchofjesuschrist.org/study/general-conference/2020/10/42harkness.p23?lang=eng#p23

Bryan Whiting commented 3 years ago

This verse is also full: https://www.churchofjesuschrist.org/study/general-conference/2021/04/52rasband.p16?lang=eng#p16