karussell / snacktory

Readability clone in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preserve paragraphs?

ondrejmirtes opened this issue · comments

Hi, is it possible to preserve/restore paragraphs with Snacktory engine? Extracted articles are not really readable when joined in one big chunk of text.

Hmmh, the text snacktory returns should free from html stuff. But probably one could create an additional method which returns a text list. Would be happy to receive pull request :) !

Hello, actually you only need to remove the c=="\n" in if (c == ' ' || (int) c == 9 || c=="\n") SHelper.java:81 in order to keep \n in your text.

However, I'm not sure if that will break anything. Works well for me.

Well, as I said: the text should be free from any special sequences and so we should create a new getHtmlText or getTextList etc ... give me a pull request + a new test + all old tests should pass and I give it a go :) !

Hello, any progress on this in last seven months? I have landed in the same situation (html formatting is gone resulting in bad display of one bulk text element). If no solution is already available as suggested by karussell above (getTextList or getHTMLText) I might as well write my own method.