scrapehero / selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

selectorlib

image

image

Documentation Status

Updates

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Example

>>> from selectorlib import Extractor >>> yaml_string = """ title: css: "h1" type: Text link: css: "h2 a" type: Link """ >>> extractor = Extractor.from_yaml_string(yaml_string) >>> html = """ <h1>Title</h1> <h2>Usage <a class="headerlink" href="http://test">¶</a> </h2> """ >>> extractor.extract(html) {'title': 'Title', 'link': 'http://test'}

About

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

License:MIT License


Languages

Language:HTML 98.6%Language:Python 1.2%Language:Makefile 0.2%