Giters
internetarchive
/
Zeno
State-of-the-art web crawler 🔱
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
43
Watchers:
9
Issues:
15
Forks:
2
internetarchive/Zeno Issues
Allow free space threshold to be customizable
Updated
19 days ago
Create option to record DNS responses to WARC records
Updated
8 months ago
Comments count
1
AWS and mismatch for User-Agent Zeno
Closed
10 months ago
Comments count
2
Investigate mailto: links
Closed
a year ago
Comments count
1
Make WARC temp dir configurable
Closed
a year ago
Comments count
1
No such file or directory panic
Updated
2 years ago
Flush HQ finished array on shutdown
Closed
2 years ago
Comments count
1
Being able to get Zeno's version with a command
Closed
2 years ago
Comments count
1
Investigate adding a version number to `software` field of warcinfo
Closed
2 years ago
Comments count
2
Ensure invalid HTTPS certificate still get crawled
Closed
2 years ago
Reset HQ entries on shutdown
Updated
2 years ago
Investigate "i/o timeout" and "TLS timeout" stalling workers for multiple minutes
Closed
2 years ago
More efficient deduplication hash table
Updated
2 years ago
Add PDF outlinks extraction
Updated
2 years ago
Custom headers defined by yml file
Updated
2 years ago