Are all the data being scraped?
bryanwhiting opened this issue · comments
https://www.churchofjesuschrist.org/study/general-conference/2020/10/45andersen.p5?lang=eng#p5
This paragraph is shortened in the data to just the stuff before the quote.
Same problem with this:
https://www.churchofjesuschrist.org/study/general-conference/2020/04/26renlund.p21?lang=eng#p21
I'm guessing this has to be due to the <sup>
references being parsed.
Better thing to do would be to scrape the raw content and process it afterward. That way I don't have to rescrape all the content.
But this verse was full:
https://www.churchofjesuschrist.org/study/general-conference/2020/10/42harkness.p23?lang=eng#p23
This verse is also full: https://www.churchofjesuschrist.org/study/general-conference/2021/04/52rasband.p16?lang=eng#p16