splicebox / PsiCLASS

Simultaneous multi-sample transcript assembler for RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running into segfault error 11

Akazhiel opened this issue · comments

/Users/jonatan/Micropeptide_project/psiclass/junc /Users/jonatan/Micropeptide_project/EGAF00001593233.sorted.bam -a > ./splice/psiclass_bam_0.raw_splice /Users/jonatan/Micropeptide_project/psiclass/junc /Users/jonatan/Micropeptide_project/EGAF00001593241.sorted.bam -a > ./splice/psiclass_bam_1.raw_splice /Users/jonatan/Micropeptide_project/psiclass/junc /Users/jonatan/Micropeptide_project/EGAF00001593275.sorted.bam -a > ./splice/psiclass_bam_2.raw_splice /Users/jonatan/Micropeptide_project/psiclass/trust-splice ./splice/psiclass_splice.list /Users/jonatan/Micropeptide_project/EGAF00001593233.sorted.bam > ./splice/psiclass_bam.trusted_splice perl /Users/jonatan/Micropeptide_project/psiclass/FilterSplice.pl ./splice/psiclass_bam_0.raw_splice ./splice/psiclass_bam.trusted_splice > ./splice/psiclass_bam_0.splice perl /Users/jonatan/Micropeptide_project/psiclass/FilterSplice.pl ./splice/psiclass_bam_1.raw_splice ./splice/psiclass_bam.trusted_splice > ./splice/psiclass_bam_1.splice perl /Users/jonatan/Micropeptide_project/psiclass/FilterSplice.pl ./splice/psiclass_bam_2.raw_splice ./splice/psiclass_bam.trusted_splice > ./splice/psiclass_bam_2.splice /Users/jonatan/Micropeptide_project/psiclass/subexon-info /Users/jonatan/Micropeptide_project/EGAF00001593233.sorted.bam ./splice/psiclass_bam_0.splice > ./subexon/psiclass_subexon_0.out sh: line 1: 41865 Segmentation fault: 11 /Users/jonatan/Micropeptide_project/psiclass/subexon-info /Users/jonatan/Micropeptide_project/EGAF00001593233.sorted.bam ./splice/psiclass_bam_0.splice > ./subexon/psiclass_subexon_0.out Terminated

I get always this return when trying to run the command ./psiclass --lb bamlist

Can you show me the first few lines of ./splice/psiclass_bam_0.splice file? Can you check whether the genome has very long chromosome names? Thank you.

For some reason, there's nothing inside in any of the files that are produced. The only file with some contents is psiclass_splice.list. I am attempting to run it in a OSX system.

Can you share the first a few alignments from the bam file? It's strange that there is no ./splice/psiclass_bam_1.raw_splice either. Thanks.

HWI-ST1243:176:D2B86ACXX:5:1310:20374:92711	117	chr1	9995	0	*	=	9995	0	TTCCGATCTGGTTAGGGTTAGGGTTAGGGTAAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGG	BFFBFFFFFFFFFFFFFFFFFFFBBBFBFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFBBB	ZC:i:6	PG:Z:MarkDuplicates	RG:Z:20130806055857685
HWI-ST1243:176:D2B86ACXX:5:1310:20374:92711	153	chr1	9995	37	101M	=	9995	0	TTCCGATCTCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCTAACC	BFFFFFFFFFFFBB<FFFFBBFFFFBBFFFBB<FFFFFFFFIFFFIIIFFIIIFFFFIIIFFBIIIFFFIIIFFFIIFIFBIIIIFFIFFFFFFFFFFBBB	X0:i:1	X1:i:0	ZC:i:6	MD:Z:0G6A0A70A21	PG:Z:MarkDuplicates	RG:Z:20130806055857685	XG:i:0	AM:i:0	NM:i:4	SM:i:37	XM:i:4	XN:i:6	XO:i:0	XT:A:U
HWI-ST1243:176:D2B86ACXX:6:2310:5351:56225	145	chr1	10000	0	101M	chr5	11685	0	ATCTCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC	FBBBFFFBB<BFFFBBFFBBB<FBBBB<FBBBBBFFFFBBFFIFFFIFFB<<IFFFFFFIIFFFIIFFFFFFFFFFFFFFFBIFFFFIFFFFBBFFFFBBB	X0:i:3	X1:i:360	ZC:i:8	MD:Z:2A0A97	PG:Z:MarkDuplicates	RG:Z:20130806033457992	XG:i:0	AM:i:0	NM:i:2	SM:i:0	XM:i:2	XN:i:1	XO:i:0	XT:A:R
HWI-ST1243:176:D2B86ACXX:6:2204:19014:33360	99	chr1	10024	19	101M	=	10125	202	CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTGACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACC	BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFIIIIIIIIIIIFIIIIIIIIIIFB'<BFBBFFFBFBBBBFFBFFFFFBB7<B<<B'077B7BB	X0:i:1	X1:i:0	ZC:i:8	MD:Z:62A38	PG:Z:MarkDuplicates	RG:Z:20130806033457992	XG:i:0	AM:i:19	NM:i:1	SM:i:19	XM:i:1	XO:i:0	XT:A:U```

Here is the first four lines of one of the BAM files.

With the ">" redirecting the output, even if the program "junc" failed, the file ./splice/psiclass_bam_0.raw_splice should be created. Can you check whether the subdirectory "splice" was created on your system, and do you have the write permission on that path? Thank you.

The .raw_splice files are indeed created, but they are empty, there's nothing inside them. Both subexon and splice subdirectories are created, and I do have permission to write on that path since I'm running this in my own machine.

Can you run this command " samtools view /Users/jonatan/Micropeptide_project/EGAF00001593233.sorted.bam | awk '$6~"N"' | head " to make sure there are spliced reads for the introns? What was the aligner used for the alignment? Thank you.

Nothing is returned from running that command. As for the aligner, I'm unsure as to which it was used since it wasn't specified and I got the BAM files from a repository, but I think it was done with bowtie.

For RNA-seq data, you need specific aligner such as HISAT, STAR, TOPHAT to allow spliced reads spanning introns, otherwise, the down-streaming assembler could not know the coordinates of the introns.

To be sure of the aligner, you can run the command "samtools view -H XXX.bam", the last few lines usually contain the information of the aligner.

There's no info about the aligner on the header. The BAM files were produced by running RSEM on the FASTQ files, which uses bowtie as the default aligner. That's what led me to think that was the aligner used since I have no other information.

Thank you. Bowtie can't generate intron information in the BAM files. Probably RSEM uses the local alignment option in bowtie (I guess) so for intronic it would report one of the anchor exons. Could you please run RSEM with STAR, and then sort the bam files?

Thought RSEM has the option for HISAT, but I think it will align the reads to the transcriptome sequence instead of the genome sequence.

I would love to do that, but unfortunately I don't have access to the FASTQ files, neither to the computational resources that would be needed to perform an alignment for so many samples.

Hello,

I am experiencing a very similar problem. I am using only 15 bam files. Out of those 15 bam files, 6 have very few reads (~100K). All the files have some spliced reads (checked using samtools). psiclass fails to generate an assembly and fails on the junc step. This is the error that I am getting:

sh: line 1: 226272 Segmentation fault      (core dumped) /lustre/project/maizegdb/sagnik/FINDER/lib/psiclass_terminal_exon_length_modified/psiclass/trust-splice FINDER_test_ARATH/assemblies_psiclass_modified/combined/splice/psiclass_output_splice_0.list FINDER_test_ARATH/alignments/SRR8422200_for_psiclass.bam > FINDER_test_ARATH/assemblies_psiclass_modified/combined/splice/psiclass_output_bam_0.trusted_splice

I have checked the contents of the file FINDER_test_ARATH/alignments/SRR8422200_for_psiclass.bam and nothing out of the ordinary popped out. I am not sure why the computation fails for this file. I have rerun psiclass several times and each time this same error crops up for the same file (in fact it was the first file in the list of bam files). I reran psiclass without that sample and the same error came up for another file. Is this a problem due to less number of samples?

I moved all my alignment files to another machine and reran the entire process all over again. This time I got a different error.

sh: line 1: 208261 Segmentation fault      (core dumped) /work/LAS/rpwise-lab/sagnik/finder/lib/psiclass_terminal_exon_length_modified/psiclass/classes -p 10 --primaryParalog --lb FINDER_test_ARATH/assemblies_psiclass_modified/fofn -s FINDER_test_ARATH/assemblies_psiclass_modified/combined/subexon/psiclass_output_subexon_combined.out -o FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output > FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_classes.log

Could you please look into this?

Thank you.

Quick update. I used some other samples and this time it worked without a glitch. I have experienced that with some samples PsiCLASS behaves rather erractically. It might be a good idea to explore this in depth.

Thanks.

I feel like the issue is that some samples might have no intron information and one of the PsiCLASS modules might fail due to this. I'm currently testing it.

I actually checked all the files and all of them had some alignments with intron info.

I think I've found and fix the bug. The bug seems can cause crashing even there are some introns as you mentioned. Could you please give it a try? Thank you.

Thank you. I tried the updated files and this time it worked without a glitch. Thanks.

Hello,

I tried it again this time with 3 RNA-Seq samples and I again got the same error with a segmentation fault. Could you please look into it?

Thank you.