cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium

Home Page:https://cumc.github.io/xqtl-protocol/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some reference data download link nolonger work, pending investigation.

hsun3163 opened this issue · comments

(py3.11) [sunh14@lc03e22 ~]$ cd /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/working
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_hg_reference --cwd ../input/reference_data &
[1] 189132
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_gene_annotation --cwd ../input/reference_data &
[2] 189133
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_ercc_reference --cwd ../input/reference_data &
[3] 189134
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_dbsnp --cwd ../input/reference_data &INFO: Running download_hg_reference:
INFO: Running download_ercc_reference:
GRCh38_ful...lus_decoy_hla.fa: <urlopen error [Errno 101] Network is unreachable>:
INFO: Running download_gene_annotation:
ERROR: download_hg_reference (id=88880766584b8229) returns an error.
Homo_sapie...8.103.chr.gtf.gz: 0%| | 0/49087092 [00:00<?, ?it/s]
ERCC92.zip: 0%| | 0/28717 [00:00<?, ?it/s]
INFO: download_ercc_reference is completed.
INFO: download_ercc_reference output: /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/input/reference_data/ERCC92.gtf /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/input/reference_data/ERCC92.fa
Homo_sapie...8.103.chr.gtf.gz: 0%|▏ | 49152/49087092 [00:00<03:33, 229662.93it/s]INFO: Workflow download_ercc_reference (ID=w297010867a7f15c9) is executed successfully with 1 completed step.

[4] 189193
Homo_sapie...8.103.chr.gtf.gz: 0%|▍ | 172032/49087092 [00:00<01:56, 421643.95it/s]
[3]- Done sos run pipeline/reference_data.ipynb download_ercc_reference --cwd ../input/reference_data
Homo_sapie...8.103.chr.gtf.gz: 1%|█ | 360448/49087092 [00:00<01:39, 490429.95it/s]ERROR: [download_hg_reference]: [0]:

RuntimeError Traceback (most recent call last)
script_8878139621259498696 in
----> download('ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa\n\n', dest_dir = cwd)

RuntimeError: Failed to download {urls[0]}
Homo_sapie...8.103.chr.gtf.gz: 1%|█▏ | 425984/49087092 [00:00<01:35, 507455.58it/s]INFO: Running download_dbsnp:
00-All.vcf.gz: <urlopen error [Errno 101] Network is unreachable>:
00-All.vcf.gz.tbi: <urlopen error [Errno 101] Network is unreachable>:
ERROR: download_dbsnp (id=eb7f9a9839feca92) returns an error.
Homo_sapie...8.103.chr.gtf.gz: 2%|██▊ | 1007616/49087092 [00:02<01:30, 532953.87it/s]ERROR: [download_dbsnp]: [0]:

RuntimeError Traceback (most recent call last)
script_8177488568793545762 in
----> download('ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz\nftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz.tbi\n\n', dest_dir = cwd)

RuntimeError: Failed to download ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz (2 out of 2)
Homo_sapie...8.103.chr.gtf.gz: 31%|█████████████████████████████████████████▉ | 15007744/49087092 [00:28<01:05, 518354.50it/s]

These two command fails:
sos run pipeline/reference_data.ipynb download_hg_reference --cwd ../input/reference_data
sos run pipeline/reference_data.ipynb download_dbsnp --cwd ../input/reference_data

could be firewall blocking ftps.

The download_dbsnp should be due to different firewall setting in different nodes. The download_hg_reference is more strange as it can be wget but not download() via sos.

ERROR: download_hg_reference (id=88880766584b8229) returns an error.
00-All.vcf.gz.tbi: downloaded                                                   :
00-All.vcf.gzERROR: [download_hg_reference]: [0]:                               :
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
script_3183852603783812485 in <module>
----> download('ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa\n\n', dest_dir = cwd)


RuntimeError: Failed to download {urls[0]}
00-All.vcf.gz(py3.11) [sunh14@dataxfer-10 working]$ ftp ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
-bash: ftp: command not found
(py3.11) [sunh14@dataxfer-10 working]$ wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
--2023-12-05 12:54:58--  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
           => ‘GRCh38_full_analysis_set_plus_decoy_hla.fa’
Resolving ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)... 193.62.193.167
Connecting to ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)|193.62.193.167|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /vol1/ftp/technical/reference/GRCh38_reference_genome ... done.
==> SIZE GRCh38_full_analysis_set_plus_decoy_hla.fa ... 3263683042
==> PASV ... done.    ==> RETR GRCh38_full_analysis_set_plus_decoy_hla.fa ... done.
Length: 3263683042 (3.0G) (unauthoritative)

14% [==============================>