sgsinclair / Voyant

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RSS Feed Error in Spyral Notebooks

kaylinland opened this issue · comments

Spyral does not recognize URL with RSS (https://www.cbc.ca/cmlink/rss-topstories) with loadCorpus function.
The error reads: An error occurred during multi-threaded document expansion.

I'm having trouble reproducing this: https://voyant-tools.org/spyral/rss-test/
Could it be that the CBC's RSS feed was down at the time?

No, I think the issue was with using loadCorpus without the summary method. It is working now--thanks!

I'm still having issues making this function work:

loadCorpus("http://www.cbc.ca/cmlink/rss-topstories", {
    inputFormat: 'xml', // force XML (not RSS)
    xmlContentXpath: "//item/description" // grab item description for content
});

Maybe there is an issue with the XML path, but it looks okay to me.

I just had a quick look, but it seems to work if you don't specify inputFormat (not currently sure why).

loadCorpus('https://www.cbc.ca/cmlink/rss-topstories', {
    xmlContentXpath: "//item/description"
}).summary()

gives me:
This corpus (86f8e166b6650ff7c869f484c76f4a17) has 20 documents with 1,870 total words and 711 unique word forms.

That's interesting! I got it to work with the addition of the .summary() option--you need the inputFormat because the idea is to turn it from a corpus with 20 separate documents to one single document.


loadCorpus('https://www.cbc.ca/cmlink/rss-topstories', {
    inputFormat: 'xml', //force xml
    xmlContentXpath: "//item/description" //define xpath 
}).summary()

This does the job!

Ok I misunderstood what you wanted to do (re: single document).
You shouldn't have to call summary() in order for loadCorpus to work by the way. Please have a look at the newly edited version of https://voyant-tools.org/spyral/rss-test/