Usage:
# Download from EGA
./main.nf --ega --out_dir="/path/to/downloaded/fastqs" --accession="EGAD000XXXXX"
# Download from a plain list of ascp links
# Automatically converts EBI ftp links into ascp links.
./main.nf --ascp --out_dir="/path/to/downloaded/fastqs" --accession_list="urls.txt"
# Download from SRA
./main.nf --sra --out_dir="results" --accession_list="SRA_Acc_List.txt"
# Download from a plain list of ftp/http links
./main.nf --wget --out_dir="results" --accession_list="urls.txt"
# Download open file from GDC
./main.nf --gdc --out_dir="results" --gdc_file_id 2776a850-d9b4-4c26-8414-528458c9c7c3
# Download multiple open files from GDC
./main.nf --gdc --out_dir="results" --gdc_file_id 2776a850-d9b4-4c26-8414-528458c9c7c3,de9105ef-cd6c-4565-8526-568b5f55a47c
or
./main.nf --gdc --out_dir="results" --gdc_file_id myGDCFileIds.txt
# Download multiple open files from GDC using a manifest file
./main.nf --gdc --out_dir="results" --gdc_manifest manifest.txt
# Download protected files from GDC
[same as above but] --gdc_token myGDCtokenFile.txt
# Download BAM slices from GDC
./main.nf --gdc --out_dir="results" --gdc_bamslice chr1,chr2:1000000-2000000 --gdc_file_id 82805a58-0e0c-4b29-bfae-e121236203a7 --gdc_token myGDCtokenFile.txt --gdc_bamslice_type region
# mutiple region/gene or files may specified see Options below.
Options:
--out_dir Path where the FASTQ files will be stored.
--accession_list List of accession numbers (of files)/download links. One file per line.
--accession Accession number (of a dataset) to download.
--parallel_downloads Number of parallel download slots (default 16).
--gdc_file_id GDC file uuid(s):
- single uuid or comma separated list of uuids
or
- file containing uuids, one file per line
--gdc_manifest GDC portal data download manifest file obtained
from https://portal.gdc.cancer.gov/
--gdc_bamslice_type Type of BAM slice to download [region|gene] (default: region)
--gdc_bamslice BAM slice to download:
- single region or comma separated list of regions, e.g.:
chr1,chr2:1000000-2000000,[...]
or
- single gene or comma separated list of genes, e.g.:
BRC1,TP53,[...]
or
- file containing regions, one file per line
or
- file containing genes, one file per line
--gdc_bamslice_fastq convert BAM slices to fastq (default false)
--gdc_token GDC access token file for protected data
--ascp_private_key_file Path to the aspera private key file. Defaults
to $(dirname $(readlink -f $(which ascp)))/../etc/asperaweb_id_dsa.openssh
Download-modes:
--ega EGA archive
--wget Just download a plain list of ftp/http links
--sra Download from SRA
--gdc Download from GDC portal
--ascp Download aspera connect links
Hint: to get faster, more reliable download links for SRA identifiers use SRA Explorer.
Store your credentials in ~/.ega.json
:
{
"username": "my.email@university.edu",
"password": "SuperSecurePasswordIncludes123",
}
A authentication token file is required to download protected data from GDC. Users with access to protected data may download a token file from https://portal.gdc.cancer.gov/ when logged in.