ENCODE-DCC / rna-seq-pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sample_genome.bam and sample_anno.bam cannot be opened by pipeline

liboxun opened this issue · comments

Describe the bug

I recently downloaded the ENCODE RNA-seq pipeline and tried to run it on my dataset, using caper submit hpc after configuring caper to slurm following the instructions of ENCODE ATAC-seq pipeline.

The RNA-seq pipeline failed after successful alignment. More specifically, it failed at the following 4 jobs:

[
    {
        "causedBy": [
            {
                "causedBy": [],
                "message": "Job rna.check_anno:1:2 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job rna.check_genome:1:2 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job rna.bam_to_signals:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job rna.rsem_quant:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ],
        "message": "Workflow failed"
    }
]

After some digging, it was clear to me that the cause was that the pipeline couldn't read the files sample_genome.bam and sample_anno.bam for reasons that I don't understand. For example,

==== NAME=rna.check_anno, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=2, JOB_ID=10984627
START=2022-08-02T14:32:40.383Z, END=2022-08-02T14:33:11.127Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam could not be opened for reading.

and

==== NAME=rna.check_genome, STATUS=RetryableFailure, PARENT=
SHARD_IDX=1, RC=2, JOB_ID=10984595
START=2022-08-02T14:32:00.376Z, END=2022-08-02T14:32:34.246Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/execution/stderr
STDERR_CONTENTS=
/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_genome.bam could not be opened for reading.

Yet those files do exist in the hard drive. For example, if I do

ls -rtlh /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam

I get

lrwxrwxrwx. 1 bl265 root 127 Aug  2 10:32 /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam -> /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-align/shard-1/execution/rep2_Ana15_D3_PE_anno.bam

And if I further do ls -rtlh on the path that the soft link points to (/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-align/shard-1/execution/rep2_Ana15_D3_PE_anno.bam), I get

-rw-r--r--. 1 bl265 root 6.8G Aug  2 10:14 /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-align/shard-1/execution/rep2_Ana15_D3_PE_anno.bam

which seems a good bam file to me.

Any idea what's going on and how to troubleshoot please?

More information:

OS/Platform

  • OS/Platform: Linux cluster with slurm
  • Conda version: used Singularity
  • Pipeline version: v2.2.0
  • Caper version: v2.2.0

Caper configuration file

contents of ~/.caper/default.conf

backend=slurm

# SLURM partition. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford Sherlock.
slurm-partition=scavenger

# SLURM account. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford SCG.
slurm-account=

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
# local-loc-dir=/work/boxun.li/99.Misc/caper_tmp_dir
local-loc-dir=/hpc/group/gersbachlab/boxun.li/99.Misc/caper_tmp_dir

# This parameter defines resource parameters for Caper's leader job only.
slurm-leader-job-resource-param=-t 48:00:00 --mem 64G

# This parameter defines resource parameters for submitting WDL task to job engine.
# It is for HPC backends only (slurm, sge, pbs and lsf).
# It is not recommended to change it unless your cluster has custom resource settings.
# See https://github.com/ENCODE-DCC/caper/blob/master/docs/resource_param.md for details.
# slurm-resource-param=-n 1 --ntasks-per-node=1 --cpus-per-task=${cpu} ${if defined(memory_mb) then "--mem=" else ""}${memory_mb}${if defined(memory_mb) then "M" else ""} ${if defined(time) then "--time=" else ""}${time*60} ${if defined(gpu) then "--gres=gpu:" else ""}${gpu}
slurm-resource-param=-n 1 --ntasks-per-node=1 --cpus-per-task=${cpu} --mem=256G ${if defined(time) then "--time=" else ""}${time*60} ${if defined(gpu) then "--gres=gpu:" else ""}${gpu}

cromwell=/hpc/home/bl265/.caper/cromwell_jar/cromwell-65.jar
womtool=/hpc/home/bl265/.caper/womtool_jar/womtool-65.jar

Input JSON file

{
    "rna.endedness" : "paired",
    "rna.fastqs_R1" : [["/work/boxun.li/Ana15/RNA/data/merged_fastq/D3_XC158_R1.fastq.gz"], ["/work/boxun.li/Ana15/RNA/data/merged_fastq/D3_XC157_R1.fastq.gz"]],
    "rna.fastqs_R2" : [["/work/boxun.li/Ana15/RNA/data/merged_fastq/D3_XC158_R2.fastq.gz"], ["/work/boxun.li/Ana15/RNA/data/merged_fastq/D3_XC157_R2.fastq.gz"]],
    "rna.align_index" : "/hpc/group/gersbachlab/boxun.li/Data/ENCODE-RNA-seq/Gencode_v29_files/GRCh38/ENCFF598IDH.tar.gz",
    "rna.rsem_index" : "/hpc/group/gersbachlab/boxun.li/Data/ENCODE-RNA-seq/Gencode_v29_files/GRCh38/ENCFF285DRD.tar.gz",
    "rna.bamroot" : "_Ana15_D3_PE",
    "rna.strandedness" : "unstranded",
    "rna.strandedness_direction" : "unstranded",
    "rna.chrom_sizes" : "/hpc/group/gersbachlab/boxun.li/Data/ENCODE-RNA-seq/Gencode_v29_files/GRCh38/GRCh38_EBV.chrom.sizes.tsv",
    "rna.align_ncpus" : 8,
    "rna.align_ramGB" : 200,
    "rna.kallisto_index" : "/hpc/group/gersbachlab/boxun.li/Data/ENCODE-RNA-seq/Gencode_v29_files/GRCh38/ENCFF471EAM.idx",
    "rna.kallisto_number_of_threads" : 4,
    "rna.kallisto_ramGB" : 50,
    "rna.rna_qc_tr_id_to_gene_type_tsv" : "/hpc/group/gersbachlab/boxun.li/Software/rna-seq-pipeline/transcript_id_to_gene_type_mappings/gencodeV29pri-UCSC-tRNAs-ERCC-phiX.transcript_id_to_genes.tsv",
    "rna.bam_to_signals_ncpus" : 8,
    "rna.bam_to_signals_ramGB" : 200,
    "rna.rsem_ncpus" : 8,
    "rna.rsem_ramGB" : 200,
    "rna.align_disk" : "local-disk 200 HDD",
    "rna.kallisto_disk" : "local-disk 200 HDD",
    "rna.rna_qc_disk" : "local-disk 200 HDD",
    "rna.mad_qc_disk" : "local-disk 200 HDD",
    "rna.bam_to_signals_disk" : "local-disk 200 HDD",
    "rna.rsem_disk" : "local-disk 200 HDD"
}

Slurm output

2022-08-02 09:46:47,010|caper.cli|INFO| Cromwell stdout: /work/boxun.li/Ana15/ENCODE_RNA/cromwell.out.4
2022-08-02 09:46:47,024|caper.caper_base|INFO| Creating a timestamped temporary directory. /hpc/group/gersbachlab/boxun.li/99.Misc/caper_tmp_dir/rna-seq-pipeline/20220802_094647_016129
2022-08-02 09:46:47,024|caper.caper_runner|INFO| Localizing files on work_dir. /hpc/group/gersbachlab/boxun.li/99.Misc/caper_tmp_dir/rna-seq-pipeline/20220802_094647_016129
2022-08-02 09:46:58,092|caper.caper_workflow_opts|INFO| Singularity image found in WDL metadata. wdl=/hpc/group/gersbachlab/boxun.li/Software/rna-seq-pipeline/rna-seq-pipeline.wdl, s=docker://encodedcc/rna-seq-pipeline:1.2.4
2022-08-02 09:47:00,489|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2022-08-02 09:47:10,992|caper.nb_subproc_thread|INFO| Subprocess finished successfully.
2022-08-02 09:47:10,992|caper.cromwell|INFO| Passed Womtool validation.
2022-08-02 09:47:10,993|caper.caper_runner|INFO| launching run: wdl=/hpc/group/gersbachlab/boxun.li/Software/rna-seq-pipeline/rna-seq-pipeline.wdl, inputs=/work/boxun.li/Ana15/ENCODE_RNA/input_jsons/Ana15_D3_RNA_input.json, backend_conf=/hpc/group/gersbachlab/boxun.li/99.Misc/caper_tmp_dir/rna-seq-pipeline/20220802_094647_016129/backend.conf
2022-08-02 09:47:34,966|caper.cromwell_workflow_monitor|INFO| Workflow: id=663354a2-fd23-4198-b1bb-a4d27758b846, status=Submitted
2022-08-02 09:47:35,064|caper.cromwell_workflow_monitor|INFO| Workflow: id=663354a2-fd23-4198-b1bb-a4d27758b846, status=Running
2022-08-02 09:47:49,303|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.align:0, retry=0, status=Started, job_id=10982718
2022-08-02 09:47:49,341|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.align:0, retry=0, status=Running
2022-08-02 09:47:49,352|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.align:1, retry=0, status=Started, job_id=10982719
2022-08-02 09:47:49,361|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.align:1, retry=0, status=Running
2022-08-02 09:47:54,254|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.kallisto:0, retry=0, status=Started, job_id=10982721
2022-08-02 09:47:54,262|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.kallisto:1, retry=0, status=Started, job_id=10982720
2022-08-02 09:47:54,268|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.kallisto:1, retry=0, status=Running
2022-08-02 09:47:54,271|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.kallisto:0, retry=0, status=Running
2022-08-02 09:53:43,497|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.kallisto:1, retry=0, status=Done
2022-08-02 09:54:21,304|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.kallisto:0, retry=0, status=Done
2022-08-02 10:31:55,374|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.align:1, retry=0, status=Done
2022-08-02 10:32:04,244|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_genome:1, retry=0, status=Started, job_id=10984595
2022-08-02 10:32:04,252|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_anno:1, retry=0, status=Started, job_id=10984596
2022-08-02 10:32:04,268|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_anno:1, retry=0, status=Running
2022-08-02 10:32:04,276|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_genome:1, retry=0, status=Running
2022-08-02 10:32:09,246|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.bam_to_signals:1, retry=0, status=Started, job_id=10984598
2022-08-02 10:32:09,246|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.rsem_quant:1, retry=0, status=Started, job_id=10984600
2022-08-02 10:32:09,250|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.bam_to_signals:1, retry=0, status=Running
2022-08-02 10:32:09,250|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.rsem_quant:1, retry=0, status=Running
2022-08-02 10:32:33,215|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_genome:1, retry=0, status=Done
2022-08-02 10:32:35,268|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_anno:1, retry=0, status=Done
2022-08-02 10:32:39,247|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_genome:1, retry=1, status=Started, job_id=10984626
2022-08-02 10:32:39,296|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_genome:1, retry=1, status=Running
2022-08-02 10:32:39,505|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.rsem_quant:1, retry=0, status=Done
2022-08-02 10:32:41,245|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.bam_to_signals:1, retry=0, status=Done
2022-08-02 10:32:44,243|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_anno:1, retry=1, status=Started, job_id=10984627
2022-08-02 10:32:44,254|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_anno:1, retry=1, status=Running
2022-08-02 10:32:49,247|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.bam_to_signals:1, retry=1, status=Started, job_id=10984632
2022-08-02 10:32:49,247|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.rsem_quant:1, retry=1, status=Started, job_id=10984633
2022-08-02 10:32:49,252|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.bam_to_signals:1, retry=1, status=Running
2022-08-02 10:32:49,253|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.rsem_quant:1, retry=1, status=Running
2022-08-02 10:33:11,107|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_anno:1, retry=1, status=Done
2022-08-02 10:33:14,505|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.check_genome:1, retry=1, status=Done
2022-08-02 10:33:18,097|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.bam_to_signals:1, retry=1, status=Done
2022-08-02 10:33:23,227|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.rsem_quant:1, retry=1, status=Done
2022-08-02 10:39:46,254|caper.cromwell_workflow_monitor|INFO| Task: id=663354a2-fd23-4198-b1bb-a4d27758b846, task=rna.align:0, retry=0, status=Done
2022-08-02 10:39:49,470|caper.cromwell_workflow_monitor|INFO| Workflow: id=663354a2-fd23-4198-b1bb-a4d27758b846, status=Failed
2022-08-02 10:40:25,901|caper.cromwell_metadata|INFO| Wrote metadata file. /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/metadata.json
2022-08-02 10:40:25,901|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2022-08-02 10:40:25,937|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2022-08-02 10:40:25,937|caper.cli|ERROR| Check stdout in /work/boxun.li/Ana15/ENCODE_RNA/cromwell.out.4
* Started troubleshooting workflow: id=663354a2-fd23-4198-b1bb-a4d27758b846, status=Failed
* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "causedBy": [],
                "message": "Job rna.check_anno:1:2 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job rna.check_genome:1:2 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job rna.bam_to_signals:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job rna.rsem_quant:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=rna.check_anno, STATUS=RetryableFailure, PARENT=
SHARD_IDX=1, RC=2, JOB_ID=10984596
START=2022-08-02T14:32:02.395Z, END=2022-08-02T14:32:39.248Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/execution/stderr
STDERR_CONTENTS=
/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam could not be opened for reading.


==== NAME=rna.check_anno, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=2, JOB_ID=10984627
START=2022-08-02T14:32:40.383Z, END=2022-08-02T14:33:11.127Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_anno/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam could not be opened for reading.


==== NAME=rna.rsem_quant, STATUS=RetryableFailure, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=10984600
START=2022-08-02T14:32:06.387Z, END=2022-08-02T14:32:44.246Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/execution/stderr
STDERR_CONTENTS=
2022-08-02 10:32:29,363 | INFO | __main__: Running RSEM command rsem-calculate-expression --bam --estimate-rspd --calc-ci --seed 12345 -p 8 --no-bam-output --ci-memory 200000 --forward-prob 0.5 --paired-end /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam rsem_index/rsem rep2_Ana15_D3_PE_anno_rsem
[E::hts_open_format] fail to open file '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam'
Cannot open /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam! It may not exist.
Traceback (most recent call last):
  File "/software/rna-seq-pipeline/src/rsem_quant.py", line 133, in <module>
    main(args)
  File "/software/rna-seq-pipeline/src/rsem_quant.py", line 107, in main
    number_of_genes_detected = calculate_number_of_genes_detected(gene_quant_fn)
  File "/software/rna-seq-pipeline/src/rsem_quant.py", line 83, in calculate_number_of_genes_detected
    quants = pd.read_csv(quant_tsv, sep="\t")
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: 'rep2_Ana15_D3_PE_anno_rsem.genes.results'
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/execution/*.isoforms.results': No such file or directory
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/execution/*.genes.results': No such file or directory
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/execution/*_number_of_genes_detected.json': No such file or directory


==== NAME=rna.rsem_quant, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=10984633
START=2022-08-02T14:32:48.381Z, END=2022-08-02T14:33:23.240Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
2022-08-02 10:33:08,226 | INFO | __main__: Running RSEM command rsem-calculate-expression --bam --estimate-rspd --calc-ci --seed 12345 -p 8 --no-bam-output --ci-memory 200000 --forward-prob 0.5 --paired-end /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam rsem_index/rsem rep2_Ana15_D3_PE_anno_rsem
[E::hts_open_format] fail to open file '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam'
Cannot open /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_anno.bam! It may not exist.
Traceback (most recent call last):
  File "/software/rna-seq-pipeline/src/rsem_quant.py", line 133, in <module>
    main(args)
  File "/software/rna-seq-pipeline/src/rsem_quant.py", line 107, in main
    number_of_genes_detected = calculate_number_of_genes_detected(gene_quant_fn)
  File "/software/rna-seq-pipeline/src/rsem_quant.py", line 83, in calculate_number_of_genes_detected
    quants = pd.read_csv(quant_tsv, sep="\t")
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: 'rep2_Ana15_D3_PE_anno_rsem.genes.results'
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/execution/*.isoforms.results': No such file or directory
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/execution/*.genes.results': No such file or directory
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-rsem_quant/shard-1/attempt-2/execution/*_number_of_genes_detected.json': No such file or directory


==== NAME=rna.check_genome, STATUS=RetryableFailure, PARENT=
SHARD_IDX=1, RC=2, JOB_ID=10984595
START=2022-08-02T14:32:00.376Z, END=2022-08-02T14:32:34.246Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/execution/stderr
STDERR_CONTENTS=
/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_genome.bam could not be opened for reading.


==== NAME=rna.check_genome, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=2, JOB_ID=10984626
START=2022-08-02T14:32:36.401Z, END=2022-08-02T14:33:14.515Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/attempt-2/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-check_genome/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_genome.bam could not be opened for reading.


==== NAME=rna.bam_to_signals, STATUS=RetryableFailure, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=10984598
START=2022-08-02T14:32:04.385Z, END=2022-08-02T14:32:44.248Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/execution/stderr
STDERR_CONTENTS=
2022-08-02 10:32:08,859 | INFO | __main__: Running STAR command STAR --runMode inputAlignmentsFromBAM                 --inputBAMfile /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/inputs/-724548390/rep2_Ana15_D3_PE_genome.bam                 --outWigType bedGraph                 --outWigStrand Unstranded                 --outWigReferencesPrefix chr
2022-08-02 10:32:09,055 | ERROR | __main__: Building bedGraph had a problem, most likely out of memory.
Traceback (most recent call last):
  File "/software/rna-seq-pipeline/src/bam_to_signals.py", line 40, in main
    assert star_return_code == 0
AssertionError
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/execution/*_genome_uniq.bw': No such file or directory
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/execution/*_genome_all.bw': No such file or directory


==== NAME=rna.bam_to_signals, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=10984632
START=2022-08-02T14:32:46.386Z, END=2022-08-02T14:33:18.113Z
STDOUT=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/attempt-2/execution/stdout
STDERR=/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
2022-08-02 10:32:51,445 | INFO | __main__: Running STAR command STAR --runMode inputAlignmentsFromBAM                 --inputBAMfile /work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/attempt-2/inputs/-724548390/rep2_Ana15_D3_PE_genome.bam                 --outWigType bedGraph                 --outWigStrand Unstranded                 --outWigReferencesPrefix chr
2022-08-02 10:32:51,503 | ERROR | __main__: Building bedGraph had a problem, most likely out of memory.
Traceback (most recent call last):
  File "/software/rna-seq-pipeline/src/bam_to_signals.py", line 40, in main
    assert star_return_code == 0
AssertionError
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/attempt-2/execution/*_genome_uniq.bw': No such file or directory
ln: failed to access '/work/boxun.li/Ana15/ENCODE_RNA/rna/663354a2-fd23-4198-b1bb-a4d27758b846/call-bam_to_signals/shard-1/attempt-2/execution/*_genome_all.bw': No such file or directory

This turns out, unsurprisingly, to be likely an issue with the file system permissions. When I ran and store files in the storage space shared among the whole institute, I would run into the failure above. If I instead run the pipeline with the same exact input, but just change where the files are stored to the storage space assigned to my lab, it worked.

Still wasn't able to pinpoint the issue exactly, but it runs now.