karussell / snacktory

Readability clone in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't split getText() into paragraphs

liusiqi43 opened this issue · comments

Hello, I’m trying to get major text from articles but I got a string without new line characters. Is there a way to extract the text while retaining all new line characters? Otherwise there will be only one single paragraphe per article…

Or is there a switch to retain certain html tag while doing the extraction? Like retain all <a> and <br>.

By the way, thanks for your great work!

Snacktory is using jsoup under the hood for this work. You might look there but I fear jsoup does not offer an option to preserve new lines.

Related to #30 so closing