Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/BoilerpipeHTMLContentHandler.java
GoogleCodeExporter opened this issue · comments
Google Code Exporter commented
This is part 1 of a 2 part fix for problems with title detection.
Currently setTitle() is being called sometimes many times per file resulting in
the class thinking there is no title when there actually is, the class just
erased the value after setting it.
The problem lies in the way the title is detected, using lastStartTag. If
characters() is called before the next start tag, the title can be overridden.
Original issue reported on code.google.com by tucker...@gmail.com
on 15 Mar 2012 at 2:33
- Merged into: #41
Attachments:
Google Code Exporter commented
Sorry, for multiple issues, I'm not sure how to suggest a patch that spans
multiple files. Also, its 3 parts not 2.
The other 2 related isses are:
http://code.google.com/p/boilerpipe/issues/detail?id=39
http://code.google.com/p/boilerpipe/issues/detail?id=40
Original comment by tucker...@gmail.com
on 15 Mar 2012 at 2:42
Google Code Exporter commented
Original comment by ckkohl79
on 21 Mar 2012 at 9:10
- Changed state: Duplicate