Add support for normalized entry "full text content" ?

Question

Add support for normalized entry "full text content" ?

isunjn opened this issue 4 months ago · comments

Typically rss feed provide "full text content" directly in their feed file, content:encoded or description in rss format, content or summary in atom format, etc.

So can feed-extractor try to add a normalized content property to entry item?

I know you have another package called article-extractor, but I don't want to do a manually html-parse if it already provides it's full "content". also some websites are not server-rendered thus can not be parsed correctly

Dong Nguyen · Answer 1 · Mon Jun 03 2024 14:28:48 GMT+0800 (China Standard Time)

@isunjn the reason that content is not included in the default result is because websites handle it inconsistently. Some websites provide this content, while others don't. The 4 default fields chosen are link, title, description and pubdate which have the highest stability, almost all feeds return them.

If you know exactly a website includes content in its feed data, you can use getExtraEntryFields() to get them into your extraction result.