raphael-group / chisel

CHISEL -- Copy-number Haplotype Inference in Single-cell by Evolutionary Links

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BAFs don't seem to agree with CNAs

ivanov-v-v opened this issue · comments

Dear Simone,

I am reading the preprint and trying to understand your algorithm. Please, correct me if I get some things wrong. So, I managed to run CHISEL on our data (10x scDNA). Then, I wrote a simple parse for calls.tsv (extracting columns, splitting "a|b" strings into (a, b) pairs, nothing more) and visualized results. What I saw surprised me, and I am not sure whether this is a bug or expected behaviour. Here are the figures, please, take a look:

image

Let's denote the allele-specific CNA pair of bin i in cell t as (a, b)_{i, t}. In this notation.
BAF, A_COUNT, B_COUNT below mean corresponding columns of calls.tsv in the output.
Then:

Figure 1: a + b (truncated to [0, 6] for more balanced colors in heatmap)
Figure 2: b / (a + b)
Figure 3: B_COUNT / (A_COUNT + B_COUNT)
Figure 4: BAF

As you can see, we have a huge clonal deletion (that purple stripe on the left) on 3p-arm.
CHISEL manages to identify it correctly in terms of allele-specific CNAs. As you can see from figures 1 and 2, results are well-aligned for different blocks in that region.

Nevertheless, BAFs computed by CHISEL were not as good. Seems like some kind of "switching error" affects those: they go from 0 to 1 several times.

I processed then the outputs in chisel-data and got similar results (but less evident):

image

But from reading your paper, I got an impression, that this shouldn't happen: BAFs should reflect the underlying allele-specific CNAs in DNA-seq. What is the relationship between A_COUNT, B_COUNT, BAF and CN_STATE columns of calls.tsv then?

A small update on this: deviations of BAF from 0.5 seem to be aligned with allele-specific copy-number aberrations. Is it theoretically possible to align the raw BAFs as well? From your preprint, I've had an impression that BAFs should support the CNAs.

image

The BAF (as well as A_COUNT and B_COUNT, the names may be confusing and we will change it) is unphased and corresponds to the standard mirrored BAF; please read more about this in steps 1-3 of the available pre-print. This is the reason for the reported switches.

HATCHet phase the allele-specific copy numbers in the fourth step, obtaining haplotype-specific copy numbers (here called, CN_STATES). This is why you see that these results of CN_STATE are consistent and do not have switches. You can read more about this in the corresponding section of the pre-print.

You can correspondingly phase the BAF by using the CN_STATE and choosing BAF = mirrored BAF or BAF = 1 - mirrored BAF according to the chosen phase of CN_STATE, i.e. according to the allelic balanced in the pairs (a, b).