Giters
scrapinghub
/
extruct
Extract embedded metadata from HTML markup
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
828
Watchers:
115
Issues:
96
Forks:
114
scrapinghub/extruct Issues
Very slow extraction for specific string
Updated
6 days ago
Comments count
6
Installing with lxml-5.2.1 ImportError: cannot import name '_ElementStringResult' from 'lxml.etree'
Updated
15 days ago
Comments count
1
Latest release on PyPi (0.16.0) breaks with lxml>5.1.0: import extruct throws ImportError: cannot import name '_ElementStringResult'
Closed
a month ago
Comments count
1
cannot import name '_ElementStringResult' from 'lxml.etree
Closed
a month ago
Comments count
2
feat: Add dependabot for github actions
Updated
a month ago
Comments count
3
chore: Remove Python 2 specific code
Updated
a month ago
SyntaxWarning invalid escape sequence '\s'
Closed
2 months ago
Package breaking due to change in lxml
Closed
2 months ago
Comments count
2
Consider switching from lxml's clean_html for enhanced security (and possibly performance)
Updated
2 months ago
Comments count
7
ImportError: cannot import name '_ElementStringResult' from 'lxml.etree'
Closed
2 months ago
Comments count
1
DeprecationWarning: the imp module is deprecated in favour of importlib
Closed
4 years ago
Comments count
8
Unable to get meta tag value from inside body
Updated
7 months ago
lxml.etree.ParserError: Document is empty
Updated
8 months ago
Comments count
5
Selectolax benchmarks
Updated
9 months ago
" in application/ld+json gives exception
Updated
9 months ago
Should not Depends on python3 (<< 3.7)
Updated
a year ago
Comments count
6
Extruct not matching up with Schema.org structured data testing tool (Incorrect image Urls)
Updated
a year ago
Comments count
3
[suggestion] adding type hints?
Updated
2 years ago
Comments count
7
error extracting json-ld for validated json
Updated
2 years ago
Some websites put meta tags outside the head.
Updated
2 years ago
Comments count
2
Adding twitter tags
Updated
2 years ago
Comments count
5
Crash on JSONDecodeError from body of YouTube page
Updated
2 years ago
Comments count
2
LD+JSON outside HTML element
Updated
2 years ago
Comments count
1
ModuleNotFoundError: No module named 'rdflib_jsonld.serializer
Updated
2 years ago
Comments count
9
Example from the README does not work any more
Updated
2 years ago
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
Updated
2 years ago
Comments count
5
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
Closed
2 years ago
Comments count
6
JSONDecodeError: Expecting value: line 1 column 1 (char 0) for URL https://www.drogaraia.com.br/nivea-desodorante-aerosol-deep-original-150ml.html
Updated
3 years ago
Comments count
2
Installation error: "rdflib-jsonld setup command: use_2to3 is invalid"
Closed
3 years ago
Comments count
1
Extruct - 0.13.0 is not compatible with the latest rdflib
Closed
3 years ago
Comments count
5
The emitter opengraph produces rigid, fragile structures
Updated
3 years ago
Comments count
4
rdflib 6.0.0 does not always return bytes, breaking extruct.rdfa.RDFaExtractor
Closed
3 years ago
Comments count
3
Error's with non-escaped quotes and multiline JSON-LD
Updated
3 years ago
Comments count
5
Matching Order in LxmlMicrodataExtractor._extract_property_value
Updated
3 years ago
Comments count
1
page.click() is not working if covered by overlay
Closed
3 years ago
ImportError: No module named 'rdflib.plugins.parsers.pyRdfa'
Closed
3 years ago
Comments count
8
Description missing on PyPI
Closed
3 years ago
Add support for compacted form of json-ld when using RDFaExtractor.
Updated
4 years ago
Comments count
1
Empty return on webpages
Closed
4 years ago
Comments count
3
Slow microdata extraction
Closed
4 years ago
Comments count
1
Random order in RDFA’s root list
Updated
4 years ago
JSONDecodeError: Extra data: line 21 column 1 (char 572) for URL https://lubelska.co.uk/
Updated
4 years ago
Comments count
2
RDFa ordering not preserved on duplicated properties
Closed
4 years ago
Comments count
7
constrains rdflib to 4.2.2?
Closed
4 years ago
Comments count
2
ModuleNotFoundError: No module named 'rdflib.plugins.parsers.pyRdfa' for RDFLib v5.0.0
Closed
4 years ago
Comments count
4
Install issues due to rdflib version 5.0.0
Closed
4 years ago
Comments count
1
Support passing pre-parsed html in extruct.extruct
Updated
4 years ago
itemprop=image not extracted from the Microdata example
Updated
5 years ago
First non-empty result should be extracted in case of OpenGraph
Closed
5 years ago
Comments count
2
Dockerfile
Closed
5 years ago
Comments count
1
Previous
Next