clwgg / nQuire

A statistical framework for ploidy estimation using NGS short-read data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Interpreting output of nQuire

ammaraziz opened this issue · comments

Hi,

Thank for making nQuire. I've installed and run the program successfully (and easily, great job!). I'm testing nQuire on a mixed bacterial sample of somewhat known composition. In this case, the ploidy would reflect the number of strains in the sequenced sample. I'm unsure how to interpret the results of nQuire model output. Here's what I've run:

nQuire create -b X.bam -o OUT -x
nQuire denoise OUT.bin -o OUT_denoised.bin
nQuire lrdmodel -t 2 OUT_denoised.bin

Denoising results:

Before: 10158
After: 2807 (27.6%)

Output is:
file free dip tri tet d_dip d_tri d_tet
OUT_denoised.bin 4207.568317 -42.968648 1239.932832 3605.480347 4250.536965 2967.635485 602.087970

In the paper, it states that:

The ΔlogL is calculated between the free model and each of the three fixed models (here represented as barplots). The fixed model with the smallest ΔlogL is chosen as the true ploidy level (diploid in this example)

With my results, the smallest ΔlogL is: d_tet: 602.087970. Am I understanding the results correctly (I have tetraploidy? Or should I be comparing the free model with the ΔlogL?

Any help would be appreciated!

Hi,

I'm glad to hear you had an easy time installing and running nQuire.
However, I'm not sure if your use-case is suited for nQuire, as the model has expectations not only about the presence of different alleles, but also their dosage. I would suspect that in a mixed bacterial community the dosages would be somewhat skewed compared to the dosages expected from genome copies within a polyploid nucleus.

Your denoising result shows that >70% of your data is removed in the process. That hints already to the distribution being fairly flat. You can inspect the distribution before and after denoising using nQuire histo. I suspect that there would be barely any peaks visible.

Regarding the interpretation of the lrdmodel output, the last three columns are indeed the ΔlogL - i.e. the distance of the "dip", "tri" and "tet" columns from the "free" column.

Hey,

Thanks for the detailed reply. That's a real bummer. Do you have plans to extend nQuire to handle bacterial mixtures? I think there's a need in bacterial genomic pipelines for ploidy detection. nQuire is also really fast for the relatively small bacterial alignments. This all might be outside the scope of nQuire, so understandable if the answer is no.

I've inspected the distribution and you are correct, it's flat (both standard and denoised sets). What's interesting is that the histo plots the frequency for 20-80 but I believe my data goes higher, to around 150.

Thanks for all your help,

Hi clwgg,

I also have a similar issue with interpreting the nQuire results. I have a couple of questions for you. My dataset is RAD-seq data on two different plant genera. I ran the results through ipyrad, bwa, samtools, and nQuire. My results aren't lining up with what the putative ploidies should be for the species based on the literature (though some could have mixed ploidy between individuals within the same species). I need a way to detect the ploidy levels (flow cytometry did not work super well on available samples, and chromosome counts aren't practical for one dataset).

I also tested the results in another dataset where we know what the ploidies are for the individuals ran, and diploids mostly worked, but tetraploids mostly did not. A colleague used nQuire on Hyb-seq and it worked quite well for his dataset. I'm wondering if I'm violating the assumptions of nQuire in some way, or if nQuire is not built for RAD-seq? Is it best for full genome datasets or Hyb-Seq and not for RAD-seq? Or am I doing something quite wrong to be having weird results?

I coded ploidies based on results after denoising (and I was typically left with 15-33% left of the sample), but these matched (for diploids) the "correct" ploidy level, so the results for those seem to be fine. Also, I'm confused on how to see the results in histogram form? Is this something you have to plot in R or something? Is there a way to see the allele ratios and summary of coverage? I'm not sure where to see this information, but it would be super helpful.

Thank you so much!
Lindsey

Hi Lindsey,
I'm having similar issues. I've used nQuire on a low coverage wgs dataset and obtained the correct expected ploidies but when I run nQuire on RADseq data I get incorrect ploidies. I use nQuire create -q 30 and -c20 and then run the lrdmodel on denoised.bin files. I'm really hoping to use nQuire to determine ploidy on a large number of RADseq libraries in the future but these current results aren't looking promising. Did you figure out a way to correctly estimate ploidy in your RADseq samples?
Erika