ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How can I restore the content with the extracted metadata

goddyZhao opened this issue · comments

Hi, I tried this tool and it is awesome but I have a question here:

Say that I want to extract the main page content (article) and show the restyled articles (like the pocket text view). But unfluff just extract the whole text (with images and videos splitted) in the json object, How can I restore the content? I even don't know how many paragraphs it has and where is the image position in the article. Thanks!

You can't do that with this package, unfortunately. It's not really designed to do that. It's designed to just grab the plain text and discard the original document structure.

You could modify the code work differently, but that's not really what it's designed to do right now. Sorry :(

well, ok! Thanks @ageitgey