extractus / feed-extractor

Simplest way to read & normalize RSS/ATOM/JSON feed data

Home Page:https://extractor-demos.pages.dev/feed-extractor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for normalized entry "full text content" ?

isunjn opened this issue · comments

Typically rss feed provide "full text content" directly in their feed file, content:encoded or description in rss format, content or summary in atom format, etc.

So can feed-extractor try to add a normalized content property to entry item?

I know you have another package called article-extractor, but I don't want to do a manually html-parse if it already provides it's full "content". also some websites are not server-rendered thus can not be parsed correctly

@isunjn the reason that content is not included in the default result is because websites handle it inconsistently. Some websites provide this content, while others don't. The 4 default fields chosen are link, title, description and pubdate which have the highest stability, almost all feeds return them.

If you know exactly a website includes content in its feed data, you can use getExtraEntryFields() to get them into your extraction result.