How to debug feeds that throw an error?
kylealwyn opened this issue · comments
Trying to pull something like https://www.nature.com/nature.rss - getting an error both locally and in demo. Ran the address through the w3c validator and came up valid.
Somewhat related, I'm also trying to use a proxy but to no avail as http://api_key@proxy.scrapingbee.com:8887
is throwing Invalid URL
Curious if something similar to extractus/article-extractor#326 is viable for this library - it'd be great to fetch the xml on my own and provide that to this parser
Sorry last thing but I think the type for headers in FetchOptions
is incorrect, believe it should be something like Record<string, string>
:
export interface FetchOptions {
/**
* list of request headers
* default: null
*/
headers?: string[];
/**
* the values to configure proxy
* default: null
*/
proxy?: ProxyConfig;
}
@kylealwyn same idea, this lib should have that method too.
@kylealwyn https://www.nature.com/nature.rss uses RDF, It's been a long time since I've seen this format!
@kylealwyn v6.2.1 has just been released with 2 new methods for extracting feed data from XML or JSON string. That mays resolve your case.
Regarding https://www.nature.com/nature.rss
, we have not plan to support RDF format for right now, because this format is quite rarely used.
Somewhat related, I'm also trying to use a proxy but to no avail as http://api_key@proxy.scrapingbee.com:8887 is throwing Invalid URL
Could you share more info about your code here? This lib does not modify or verify proxy url. it simply prefers to pick the url from the proxy if that presents.
Awesome! Will check it out. Would be great to expose the utils to validate whether xml or json feed, or have a unified entrypoint that runs the validation & normalization, but I will copy those over for now!
Regarding https://www.nature.com/nature.rss, we have not plan to support RDF format for right now, because this format is quite rarely used.
Makes sense
Could you share more info about your code here? This lib does not modify or verify proxy url. it simply prefers to pick the url from the proxy if that presents.
I'm doing something like
const res = await read(
feed.xmlUrl,
{},
{
proxy: {
target: 'http://127.0.0.1:3001',
},
},
);
Where the target is the url initially shared, or any ip/port combination, and getting back an Invalid URL
error.
Also, what would be the lift on supporting RDF feeds? https://rss.slashdot.org/Slashdot/slashdotMain is another big one I'm interested in. Seeing the format quite a bit through my explorations.
@kylealwyn thank you, RDF can reuse almost logic from RSS parser. I will try to implement a draft.