cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium

Home Page:https://cumc.github.io/xqtl-protocol/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error message - BAM file shares no contigs with GTF - Knight eQTL rnaseq call

Chunmingl opened this issue · comments

From the step of rnaseqc_call, 8 bam files returned with an error message indicating BAM file shares no contigs with GTF
and this error message is preventing moving forward to the next step for these 8 bam files.

Here is one of the error message from one of the bam files
[tb755bc6d14ec7dca]: Executing script in Singularity returns an error (exitcode=11, stderr=/mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/PA00003164.rnaseqc.gene_tpm.gct.stderr). The script has been saved to /home/cl4215/.sos/f09861f58d65b381/singularity_run_20397.sh. To reproduce the error please run: singularity exec /mnt/vast/hpc/csg/snuc_pseudo_bulk/eight_celltypes_analysis/SuSiE/containers/rna_quantification.sif /bin/bash /home/cl4215/.sos/f09861f58d65b381/singularity_run_20397.sh

Below is submitted command

nohup sos run ~/githubrepo/xqtl-pipeline/code/molecular_phenotypes/calling/RNA_calling.ipynb rnaseqc_call \
    --cwd /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq \
    --samples /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/xqtl_protocol_data.fastqlist    --data-dir /mnt/vast/hpc/csg/cl4215/ROSMAP/knight \
    --gtf /mnt/vast/hpc/csg/cl4215/mrmash/reference_data/Homo_sapiens.GRCh38.103.chr.reformatted.collapse_only.gene.gtf \
    --container /mnt/vast/hpc/csg/snuc_pseudo_bulk/eight_celltypes_analysis/SuSiE/containers/rna_quantification.sif  \
    --reference-fasta /mnt/vast/hpc/csg/cl4215/mrmash/reference_data/GRCh38_full_analysis_set_plus_decoy_hla.noALT_noHLA_noDecoy.fasta \
    --bam_list /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/xqtl_protocol_data_bam_list \
    -c /mnt/vast/hpc/csg/molecular_phenotype_calling/csg.yml -j 10 -q csg2
commented

@Chunmingl i see some tips online that might help -- did you try those? One other obvious thing to check is if your BAM files is intact, that is, if the size of the BAM files in question is much smaller than others that work.

The bam files were intact, and the qc files looked ok. however, many more samples returned the same errors after rerunning from rnaseq call. @hsun3163

tail /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rnaseqc.gene_tpm.gct.stderr

BAM file shares no contigs with GTF
BAM file shares no contigs with GTF
BAM file shares no contigs with GTF
BAM file shares no contigs with GTF

As it occurs, the bam file is empty. Can you point me to the analysis notebook that documenting all your analysis as we discussed?

hs3163@csglogin:/mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test$ ls -lah  /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961*
-rw-r--r-- 1 cl4215 hgrcgrid_statgen    0 May 23 15:12 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.Aligned.sortedByCoord.out.bam
-rw-r--r-- 1 cl4215 hgrcgrid_statgen 3.6G Jun  3 18:38 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.Aligned.toTranscriptome.out.bam
-rw-r--r-- 1 cl4215 hgrcgrid_statgen  144 May 28 10:27 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rnaseqc.gene_tpm.gct.stderr
-rw-r--r-- 1 cl4215 hgrcgrid_statgen    0 May 24 13:18 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rnaseqc.gene_tpm.gct.stdout
-rw-r--r-- 1 cl4215 hgrcgrid_statgen 6.3M Jun  4 02:13 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rsem.genes.results
-rw-r--r-- 1 cl4215 hgrcgrid_statgen  15M Jun  4 02:13 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rsem.isoforms.results
-rw-r--r-- 1 cl4215 hgrcgrid_statgen   73 Jun  4 01:58 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rsem.isoforms.stderr
-rw-r--r-- 1 cl4215 hgrcgrid_statgen 506K Jun  4 02:13 /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/test/PA00000961.rsem.isoforms.stdout

I submitted in bash scripts. Here are the some of the scripts I have previously run.

/home/cl4215/githubrepo/mrmash_ROSMAP/Knight/knight_rnaseq3.sh
/home/cl4215/githubrepo/mrmash_ROSMAP/Knight/knight_rnaseq4.sh

please do document the analysis in the notebook going forward...it would be hard to keep track of bash scripts.

that being said, how is

/mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/rnaseq/xqtl_protocol_data_bam_list

generated ?

This doesn't seems to be the correct output for the STAR_output step of the knight data

for the star output step: I ran it multiple times (including test running with a smaller sample size ) - most of the time it ended with no error message but the status of the job did not seem complete.

tail /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/errout/knight_staroutput3.log

INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks. 

for the star output step: I ran it multiple times (including test running with a smaller sample size ) - most of the time it ended with no error message but the status of the job did not seem complete.

tail /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/errout/knight_staroutput3.log

INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks.
INFO: Waiting for the completion of 2 tasks. 

Given the complexity of the issue, again, please prepare a notebook documenting all the command you have ran and the log file associated with each command. Otherwise it is impossible to look into.

Here is the summarized notebook of scripts and log files
/home/cl4215/githubrepo/mrmash_ROSMAP/Knight/knight_contigs_GTF.ipynb

Here is the summarized notebook of scripts and log files /home/cl4215/githubrepo/mrmash_ROSMAP/Knight/knight_contigs_GTF.ipynb

can you fork the fungi-QTL-analysis repo, and send a pr to upload this notebook (preferable along with other notebooks that are relevant to the xQTL project)? I don't really have access to notebooks that are not in my home dir due to the way jupyterlab works.

Pr is sent

Pr is sent

Apparently the two samples' STAR failed due to walltime. can you rerun the failed samples in a new cwd with increased walltime by setting --walltime in the SOS command?

hs3163@csglogin:/mnt/vast/hpc/csg/cl4215/ROSMAP/knight/output/test$ qacct -j 5473097
==============================================================
qname        csg.q
hostname     node48
group        cl4215
owner        cl4215
project      NONE
department   defaultdepartment
jobname      job_t3c81fbcdec02417a
jobnumber    5473097
taskid       undefined
account      sge
priority     0
qsub_time    Sat May 27 17:00:29 2023
start_time   Sat May 27 17:26:43 2023
end_time     Sat May 27 22:26:44 2023
granted_pe   orte
slots        8
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status  137                  (Killed)
ru_wallclock 18001s
ru_utime     0.175s
ru_stime     0.034s
ru_maxrss    7.410KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    8013
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   16
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     5076
ru_nivcsw    6
cpu          141516.840s
mem          4653.239TBs
io           492.403GB
iow          0.000s
maxvmem      35.705GB
arid         undefined
ar_sub_time  undefined
category     -u cl4215 -q csg.q -l h_rt=18000,h_vmem=40G -pe orte 8

I received an error message
/mnt/vast/hpc/csg/cl4215/ROSMAP/knight/errout/knight_staroutput7.log

ERROR: [picard_qc (picard_qc)]: [picard_qc]: Failed to execute process
"bash(fr"""set -e\ntouch {_output[0]:n}.CollectMultipleMetrics...["cord_bam"].zap()\n\n"
name 'job_size' is not defined
[STAR_output]: Exits with 1 pending step (STAR_output)

the submitted script can be accessed in the recent pull request:
fungen-xqtl-analysis/analysis/Wang_Columbia/knight/eQTL/knight_STAR_Output7.sh

I received an error message /mnt/vast/hpc/csg/cl4215/ROSMAP/knight/errout/knight_staroutput7.log

ERROR: [picard_qc (picard_qc)]: [picard_qc]: Failed to execute process
"bash(fr"""set -e\ntouch {_output[0]:n}.CollectMultipleMetrics...["cord_bam"].zap()\n\n"
name 'job_size' is not defined
[STAR_output]: Exits with 1 pending step (STAR_output)

the submitted script can be accessed in the recent pull request: fungen-xqtl-analysis/analysis/Wang_Columbia/knight/eQTL/knight_STAR_Output7.sh

Some one have had this issue before, which was solved after updating their sos, can you try again after updating your sos?