How can I restore the content with the extracted metadata

Question

How can I restore the content with the extracted metadata

goddyZhao opened this issue 8 years ago · comments

Hi, I tried this tool and it is awesome but I have a question here:

Say that I want to extract the main page content (article) and show the restyled articles (like the pocket text view). But unfluff just extract the whole text (with images and videos splitted) in the json object, How can I restore the content? I even don't know how many paragraphs it has and where is the image position in the article. Thanks!

Adam Geitgey · Answer 1 · Tue Mar 29 2016 00:25:16 GMT+0800 (China Standard Time)

You can't do that with this package, unfortunately. It's not really designed to do that. It's designed to just grab the plain text and discard the original document structure.

You could modify the code work differently, but that's not really what it's designed to do right now. Sorry :(

Goddy Zhao · Answer 2 · Thu Mar 31 2016 16:26:44 GMT+0800 (China Standard Time)

well, ok! Thanks @ageitgey