clwgg / nQuire

A statistical framework for ploidy estimation using NGS short-read data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nQuire create gives "Segmentation fault"

MikeEHMatson opened this issue · comments

Hello,
I'm running

nQuire create -b $sample.bam -o $sample

Yet the only thing returned is the error "Segmentation fault".
My bamfile originates from bwa-mem -> Picard Sort -> MergeSamfiles -> IndelRealign. The bamfiles appear valid, as further GATK steps proceed as expected. The version of nQuire is nQuire/af0a7f0

Thanks,
Mike

Hi Mike,

do you use 'MergeSamfiles' to combine multiple samples with different Read Groups into one bam file? If so, I haven't yet implemented a way to handle multi-sample bam files. Still, it shouldn't error out just with a SegFault. If you can confirm that it indeed is a multi-sample bam, I'll investigate this further.

Thanks for the report!

Hi Mike,

unfortunately, I have not been able to reproduce the error with data of my own, trying to replicate the processing pipeline you describe. Could you make a test dataset available, for which you see the error?

Thanks!

It depends on a couple of things apart from coverage. A big one is repetitiveness of the genome, as misalignments get more frequent in highly complex genomes, and noise increases.
For the parameters, I'd refrain from dropping the minimum coverage below 10, and advise to use mapping quality filter of at least 1 (I'm considering making these into defaults).
I also implemented a way to create the bin file only from select regions. So if you have something like mappability estimates for parts of the genome, you can try to run the model just on regions you are more confident in to give you reliable base frequencies.
20x is definitely enough to at least play around a little. However, to reliably distinguish especially tetraploids in a complex genome, more might be needed.