dimkouv / massivedl

Download a large list of files concurrently

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

massivedl

Download a large list of files in parallel.

Install

# for linux 64bit
wget https://github.com/dimkouv/massivedl/releases/download/v1.2/massivedl_linux_amd64
chmod +x massivedl_linux_amd64
mv massivedl_linux_amd64 /usr/local/bin/massivedl

Usage

Create a .csv file with the downloads

filename,url
0.png,https://placehold.it/100x100
1.png,https://placehold.it/100x101
2.png,https://placehold.it/100x102
...

Assuming the file was named data.csv we can download the files using

massivedl -p 10 -i data.csv -s 1 -o downloads

Command line parameters

-p <int> (default=10)          : Maximum number of parallel requests
-s <int> (default=0)           : Number of skipped lines from input csv
-i <str>                       : Input csv file with the list of urls
-o <str> (default='downloads') : Directory to place the downloads

Stop and continue later

You can stop and continue downloading later.
Press Ctrl+C then you will have the following dialog.

...
Do you want to save progress? [Y/n]: yes

Progress has been saved!
Use the following command to continue downloading

	massivedl --load /path/to/savedfile.save

Use Cases

With this tool I was able to download about 1.5 million images (~60GB) for a machine learning project.

About

Download a large list of files concurrently

License:GNU General Public License v3.0


Languages

Language:Go 96.0%Language:Makefile 4.0%