BAFs don't seem to agree with CNAs
ivanov-v-v opened this issue · comments
Dear Simone,
I am reading the preprint and trying to understand your algorithm. Please, correct me if I get some things wrong. So, I managed to run CHISEL on our data (10x scDNA). Then, I wrote a simple parse for calls.tsv
(extracting columns, splitting "a|b" strings into (a, b) pairs, nothing more) and visualized results. What I saw surprised me, and I am not sure whether this is a bug or expected behaviour. Here are the figures, please, take a look:
Let's denote the allele-specific CNA pair of bin i in cell t as (a, b)_{i, t}. In this notation.
BAF, A_COUNT, B_COUNT below mean corresponding columns of calls.tsv
in the output.
Then:
Figure 1: a + b (truncated to [0, 6] for more balanced colors in heatmap)
Figure 2: b / (a + b)
Figure 3: B_COUNT / (A_COUNT + B_COUNT)
Figure 4: BAF
As you can see, we have a huge clonal deletion (that purple stripe on the left) on 3p-arm.
CHISEL manages to identify it correctly in terms of allele-specific CNAs. As you can see from figures 1 and 2, results are well-aligned for different blocks in that region.
Nevertheless, BAFs computed by CHISEL were not as good. Seems like some kind of "switching error" affects those: they go from 0 to 1 several times.
I processed then the outputs in chisel-data and got similar results (but less evident):
But from reading your paper, I got an impression, that this shouldn't happen: BAFs should reflect the underlying allele-specific CNAs in DNA-seq. What is the relationship between A_COUNT
, B_COUNT
, BAF
and CN_STATE
columns of calls.tsv
then?
The BAF (as well as A_COUNT
and B_COUNT
, the names may be confusing and we will change it) is unphased and corresponds to the standard mirrored BAF; please read more about this in steps 1-3 of the available pre-print. This is the reason for the reported switches.
HATCHet phase the allele-specific copy numbers in the fourth step, obtaining haplotype-specific copy numbers (here called, CN_STATES
). This is why you see that these results of CN_STATE
are consistent and do not have switches. You can read more about this in the corresponding section of the pre-print.
You can correspondingly phase the BAF by using the CN_STATE
and choosing BAF = mirrored BAF
or BAF = 1 - mirrored BAF
according to the chosen phase of CN_STATE
, i.e. according to the allelic balanced in the pairs (a, b).