Giters
turicas
/
crau
Easy-to-use Web archiver
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
53
Watchers:
4
Issues:
17
Forks:
9
turicas/crau Issues
Headers not preserved correctly
Updated
a year ago
Comments count
2
Remove custom_settings from spider
Updated
2 years ago
UnicodeDecodeError: 'ascii' codec can't decode byte
Updated
2 years ago
Check if redirects are being written to WARC file
Closed
2 years ago
Comments count
2
Black error on Ubuntu 18.04.03
Closed
5 years ago
Comments count
4
Transfer encoding is not preserved
Updated
5 years ago
Capture any HTTP code
Closed
5 years ago
Change User-Agent
Closed
5 years ago
Remove URL fragment before saving
Closed
5 years ago
Change default settings to optimize broad crawls
Updated
5 years ago
Invalid Syntax
Closed
5 years ago
Comments count
1
Check possibility of migrating to scrapy.spiders.CrawlSpider
Updated
5 years ago
Expose spider configurations to `crau archive`
Closed
5 years ago
Add option to restrict domains
Updated
5 years ago
Create a scrapy backed cache based on WARC
Updated
5 years ago
Create `search` command
Updated
5 years ago
Implement a browser-based spider
Updated
5 years ago