iogf / ehp

Easy Html Parser is an AST generator for html/xml documents. You can easily delete/insert/extract tags in html/xml documents as well as look for patterns.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ehp

Easy Html Parser is an AST generator for html/xml documents. EHP is a nice tool to parse html content.
It has a short learning curve compared to other parsers. You don't need to lose time going through massive documentation to do simple stuff. EHP handles broken html nicely.

EHP has a short learning curve, you can go through some examples, in a few minutes you can implement cool stuff.

Create/Delete elements

from ehp import *

html = Html()

data = '''
<body> <em> foo </em> </body>
'''

dom = html.feed(data)

for root, item in dom.find_with_root('em'):
    root.remove(item)

print(dom)

<body >  </body>

Manipulate attributes

from ehp import *

data = '''<html><body> <p> It is simple.</p> </body></html>'''

dom = Html().feed(data)

for ind, name, attr in dom.walk():
    attr['size']  = '+2'

print(dom)
<html size="+2" ><body size="+2" > <p size="+2" > It is simple.</p> </body></html>

Install

Note: Ehp works on python3 only, python2 support is no longer available.

pip ehp install

Note: The module is quite well documented, you can find documentation there.

About

Easy Html Parser is an AST generator for html/xml documents. You can easily delete/insert/extract tags in html/xml documents as well as look for patterns.

License:MIT License


Languages

Language:Python 94.5%Language:Shell 5.5%