Giters
cldellow
/
datasette-scraper
Add website scraping abilities to Datasette
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
59
Watchers:
1
Issues:
53
Forks:
1
cldellow/datasette-scraper Issues
500 Error with dss_crawl and a few other tables after installing
Updated
2 months ago
javascript interaction?
Updated
6 months ago
table overload: consider hiding tables and using views or other affordances
Closed
a year ago
usability: if discover-urls returns a str, raise an exception
Updated
a year ago
dss fails when its not the primary database
Closed
a year ago
permit installing into a database with pre-existing tables
Closed
a year ago
plugin: extract-ecommerce-shopify
Updated
a year ago
support scheduled crawls
Updated
a year ago
Comments count
1
plugin: extract-ecommerce-json-ld
Closed
a year ago
make plugins pluggable
Closed
a year ago
link dss_extract_stats table column to actual table
Closed
a year ago
link dss_job_stats hostname column to dss_crawl_queue_history
Closed
a year ago
make ops db configurable
Closed
a year ago
Comments count
1
be able to rewrite existing fetch_cache entries
Closed
a year ago
be able to train dictionaries automatically
Closed
a year ago
Comments count
1
support zstd dictionaries
Closed
a year ago
consider a knob to hide _dss_ tables
Closed
a year ago
Comments count
1
be able to edit host rates in datasette
Closed
a year ago
Comments count
2
take over row page for _dss_crawl
Closed
a year ago
take over table page for _dss_crawl table
Closed
a year ago
crawl_queue_history should have fkey to fetch_cache
Closed
a year ago
consider setting default sort orders
Closed
a year ago
plugin: extract-links
Closed
a year ago
Comments count
1
hook: extract_from_response
Closed
a year ago
Comments count
1
Have a way to make an HTTP request that goes through the plugin system
Closed
a year ago
Comments count
1
plugin: seed-sitemaps
Closed
a year ago
be more aggressive in asserting Wal mode
Closed
a year ago
plugins: max-pages, max-pages-per-domain
Closed
a year ago
plugin: canonicalize-shopify-urls
Closed
a year ago
plugin: discover-allow
Closed
a year ago
Comments count
1
plugin: discover-deny
Closed
a year ago
consider: re-insert (or update) a crawl item if has a lower depth
Closed
a year ago
Comments count
1
plugin: discover-redirect-urls
Closed
a year ago
plugin: fetch-cache
Closed
a year ago
Comments count
1
hook: config_schema, config_default_values
Closed
a year ago
Comments count
1
plugin: discover-only-same-origin
Closed
a year ago
hook: canonicalize_urls
Closed
a year ago
hook: discover_urls
Closed
a year ago
plugin: discover-html-urls
Closed
a year ago
hook: fetch_url
Closed
a year ago
add background workers that Do The Thing
Closed
a year ago
Comments count
5
plugin: max-depth
Closed
a year ago
hook: before_fetch_url
Closed
a year ago
add plugin system
Closed
a year ago
plugin: seed-urls
Closed
a year ago
hook: get_seed_urls
Closed
a year ago
plugin: extract-object
Updated
a year ago
plugin: extract-seo
Updated
a year ago
consider dashboards
Updated
a year ago
throw if the default database is _memory, or isn't mutable
Closed
a year ago
Previous
Next