Zeiver is designed to scrape and download content recursively from ODs (open directories). It also provides a means of recording links and scanning ODs for content.
*Zeiver does not download the entire OD itself, only the files.
For ease of use, check out the Zeiver configurator.
Zeiver currently has 4 major modules:
- Grabber (HTTP)
- Grabs content from the internet. (Webpage,files,etc)
- Scraper
- Recursively grabs all links from an OD.
- Downloader
- Downloads content retrieved from Scraper (or from a file)
- Recorder
- Saves a record of all files that were found in the OD
- Records are saved to a file called URL_Records.txt. Name can be
changed using
--output-record
- Creates stat files (JSON files containing statistical data about what was retrieved)
All components can be used independently.
The Grabber module repeatedly grabs a webpage for the Scraper to parse (based on parameters). The Scraper takes the webpage and recursively scrapes the links from them. Afterwards, the links are either sent to the Recorder (Disabled by default), specified with:
--record-only
--record
AND/OR
Downloader (Enabled by default). The Downloader uses the Grabber to download the files' data from the internet. the Downloader then writes the data to a newly created files.
- Uses asynchronous runtime.
- Random & fixed delays of HTTP requests.
- Ability to customize how files retrieved or not.
- Scans an OD for content while transparently displaying the traversal process.
Supported ODs can be found in OD.md.
-
Install Rust.
- If Rust is not installed, please follow the instructions here
-
Once Rust is installed, open a CLI & type
cargo install --branch main --git https://github.com/ZimCodes/Zeiver
- This will install Zeiver from Github
-
And that's it! To use Zeiver, start each command with
zeiver
.
The following code downloads files from example.com/xms/imgs, saves them in a local directory called Cool_Content, & sends a request with the ACCEPT-LANGUAGE header.
zeiver -h "accept-language$fr-CH, en;q=0.8, de;q=0.7" -o "./Cool_Content" example.com/xms/imgs
URLs...
Link(s) to the OD(s) you would like to download content from.
*This is not needed if you are using -i, --input-file
.
-h, --help
Prints help information.
-V, --version
Prints version information
-v, --verbose
Enable verbose output
--test
Run a scrape test without downloading or recording.
--scan
Scan ODs
Scan ODs displaying their content to the terminal. A shortcut to activating
--verbose
& --test
.
-d, --depth
Specify the maximum depth for recursive scraping. Can also be used to traverse subpages (ODs with previous & next buttons).
Default: 20
. Depth of1
is current directory.
-A, --accept
Files to accept for scraping
Using Regex, specify which files to accept for scraping. Only the files that matches the regex will be
acceptable for download. *This option takes precedence over --reject, -r
.
Ex: zeiver -A "(mov|mp3|lunchbox_pic1\.jpg|(pic_of_me.gif))"
-R, --reject
Files to reject for scraping
Using Regex, specify which files to reject for scraping. Only the files that match the regex will be
rejected for download. *--accept, -a
takes precedence over this option.
Ex: zeiver -R "(jpg|png|3gp|(pic_of_me.gif))"
--record
Activates the Recorder
Enables the Recorder which saves the scraped links to a file.
*Option cannot be used with --record-only
.
--record-only
Save the links only
After scraping, instead of downloading the files, save the links to them. *The downloader will be disabled when this option is active. Enables Recorder instead.
--output-record
Changes the name of the record file. This file is where the recorder will store the links. Default: URL_Records.txt
--no-stats
Prevents Recorder from creating _stat_
files.
The Recorder will no longer create _stat_
files when saving scraped links to a file. Default: false
Ex: stat_URL_Record.txt
--no-stats-list
Prevent Recorder from writing file names to stat files
Stat files includes the names of all files in alphabetical order alongside the number of file extensions. This option prevents the Recorder from adding file names to stat files.
-i, --input-file
Read URLs from a file to be sent to the Scraper. *Each line represents a URL to an OD.
Ex: zeiver -i "./dir/urls.txt"
--input-record
Read URLs from an input file which contains links to other files and create a stats file based on the results. This option is
for those who have a file filled with random unorganized links to a bunch of other files and want to take advantage of Zeiver's
Recorder module.
*Each line represents a URL to a file. Activates Recorder. Valid with --verbose
,
--output
, --output-record
-o, --output
Save Directory.
The local directory path to save files. Files saved by the Recorder are also stored here.
Default: ./
Ex: zeiver -o "./downloads/images/dir"
-c,--cuts
Ignores a specified number of remote directories from being created.
*Only available when downloading. Default: 0
Ex: URL: example.org/pub/xempcs/other/pics
Original Save Location: ./pub/xempcs/other/pics
zeiver --cuts 2 www.example.org/pub/xempcs/other/pics
New Save Location: ./other/pics
--no-dirs
Do not create a hierarchy of directories structured the same as the URL the file came from. All files will be saved to the current output directory instead.
*Only available when downloading.
--print-headers
Prints all Response Headers to the terminal
Prints all available Response headers received from each Request to the terminal. Option takes precedence over all other options!
--print-header
Prints a Response Header to terminal
Prints a specified Response Header to the terminal for each url. This Option takes precedence over all other options.
--https-only
Use HTTPS only
Restrict Zeiver to send all requests through HTTPS connections only.
-H, --headers
Sets the default headers to use for every request. *Must use the 'header$value' format. Each header must also be separated by a comma.
Ex: zeiver -H content-length$128,"accept$ text/html, application/xhtml+xml, image/webp"
-U
The User Agent header to use. Default: Zeiver/VERSION
-t, --tries
The amount of times to retry a failed connection/request. Default: 20
-w, --wait
Wait a specified number of seconds between each scraping & download requests.
--retry-wait
The wait time between each failed request. Default: 10
--random-wait
Wait a random amount of seconds between each request.
The time between requests will vary between 0.5 * --wait,-w
(inclusive) to 1.5 * --wait,-w
(exclusive)
-T, --timeout
Adds a request timeout for a specified number of seconds.
-r, --redirects
Maximum redirects to follow. Default: 10
--proxy
The proxy to use.
Ex: zeiver --proxy "socks5://192.168.1.1:9000"
--proxy-auth
The basic authentication needed to use the proxy. *Must use the 'username:password' format.
--all-certs
Accepts all certificates (Beware!)
Accepts all certificates even invalid ones. Use this option at your own risk!
Having trouble entering a long URL in the terminal? Place them inside an input file and use --input-file
instead.
Trying using the --all-certs
option, but be wary with this option.
Some ODs will send Zeiver HTML Documents without any content (files/folders) from the OD. This is because Zeiver retrieves an HTML Document without JavaScript & some ODs will not work without it.
Zeiver is licensed under the MIT and Apache 2.0 Licenses.
See the MIT and Apache-2.0 for more details.