Giters
TeamHG-Memex
/
html-text
Extract text from HTML
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
125
Watchers:
15
Issues:
16
Forks:
24
TeamHG-Memex/html-text Issues
Consider switching from lxml's clean_html for enhanced security (and possibly performance)
Updated
9 months ago
Comments count
2
Blank lines created by <br> cannot be parsed correctly
Updated
a year ago
Comments count
3
.extract_text returning incorrect format.
Closed
3 years ago
Preserve space inside <pre> tags
Updated
4 years ago
extract_text fails with misleading error message when given bytes instead of unicode [py3]
Updated
4 years ago
Comments count
2
extract_text does not work on lxml XHTML element
Updated
4 years ago
Comments count
1
guess_layout does not work on XHTML elements
Updated
4 years ago
Comments count
1
Don't always insert spaces around inline tags?
Updated
5 years ago
Comments count
4
improve newline handling
Closed
6 years ago
Comments count
1
support unicode punctuation better
Updated
6 years ago
Handle non-breaking spaces and other special unicode characters
Updated
7 years ago
Comments count
4
Allow passing a selector and extract text only from given selector
Closed
7 years ago
button values?
Updated
7 years ago
Comments count
1
img alt handling
Updated
7 years ago
whitespace issues
Closed
7 years ago
Comments count
4