milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.

Home Page:https://mixcr.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UMI with Nanopore Data

bshim181 opened this issue · comments

Hello, I had a quite disparity in number of clonotypes assembled before and after accounting for read UMIs. I was wondering what might be the reason behind this phenomenon.

This is the assemble report when executed without read UMI consideration.

Analysis time: 21.1s
Final clonotype count: 1734
Reads used in clonotypes, percent of total: 1565874 (72.29%)
Average number of reads per clonotype: 903.04
Reads dropped due to the lack of a clone sequence, percent of total: 39103 (1.81%)
Reads dropped due to a too short clonal sequence, percent of total: 2 (0%)
Reads dropped due to low quality, percent of total: 0 (0%)
Reads dropped due to failed mapping, percent of total: 0 (0%)
Reads dropped with low quality clones, percent of total: 113001 (5.22%)
Aligned reads processed: 1878052
Reads used in clonotypes before clustering, percent of total: 1765049 (81.48%)
Number of reads used as a core, percent of used: 1765049 (100%)
Mapped low quality reads, percent of used: 0 (0%)
Reads clustered in PCR error correction, percent of used: 199175 (11.28%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)
Clonotypes dropped as low quality: 99045
Clonotypes eliminated by PCR error correction: 6848
Clonotypes pre-clustered due to the similar VJC-lists: 0
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
IGH chains: 937 (54.04%)
IGH non-functional: 638 (68.09%)
IGK chains: 787 (45.39%)
IGK non-functional: 522 (66.33%)
IGL chains: 10 (0.58%)
IGL non-functional: 0 (0%)

This is the assemble report when I considered a 12 bp UMI at the 5' end.
It seems like there is almost 98% concentration of UMIs to a single clonotype. What might be the reasonable explanation to why this occurs? It seems like there is not that many alignments discarded, so I was wondering why all the other clones previously detected disappeared.

Analysis time: 12m
Final clonotype count: 87
Reads used in clonotypes, percent of total: 1715080 (79.18%)
Average number of reads per clonotype: 19713.56
Reads dropped due to the lack of a clone sequence, percent of total: 35531 (1.64%)
Reads dropped due to a too short clonal sequence, percent of total: 0 (0%)
Reads dropped due to low quality, percent of total: 0 (0%)
Reads dropped due to failed mapping, percent of total: 0 (0%)
Reads dropped with low quality clones, percent of total: 505 (0.02%)
Aligned reads processed: 1720564
Reads used in clonotypes before clustering, percent of total: 1720059 (79.41%)
Number of reads used as a core, percent of used: 1720059 (100%)
Mapped low quality reads, percent of used: 0 (0%)
Reads clustered in PCR error correction, percent of used: 4979 (0.29%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)
Clonotypes dropped as low quality: 234
Clonotypes eliminated by PCR error correction: 236
Clonotypes pre-clustered due to the similar VJC-lists: 0
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 4978 (0.23%)
IGH chains: 47 (54.02%)
IGH non-functional: 23 (48.94%)
IGK chains: 35 (40.23%)
IGK non-functional: 15 (42.86%)
IGL chains: 5 (5.75%)
IGL non-functional: 0 (0%)
Pre-clone assembler report:
Number of input groups: 93301
Number of input groups with no assembling feature: 12
Number of input alignments: 1805673
Number of alignments with assembling feature: 1770142 (98.03%)
Number of output pre-clones: 101943
Number of pre-clonotypes per group:
0: + 744 (0.8%) = 744 (0.8%)
1: + 83175 (89.16%) = 83919 (89.96%)
2: + 9342 (10.01%) = 93261 (99.97%)
3: + 28 (0.03%) = 93289 (100%)
Number of assembling feature sequences in groups with zero pre-clonotypes: 2623
Number of dropped pre-clones by tag suffix conflict: 0
Number of dropped alignments by tag suffix conflict: 0
Number of core alignments: 1702472 (94.28%)
Discarded core alignments: 67670 (3.97%)
Empirically assigned alignments: 18092 (1%)
Empirical assignment conflicts: 0 (0%)
Tag+VJ-gene empirically assigned alignments: 18092 (1%)
VJ-gene empirically assigned alignments: 0 (0%)
Tag empirically assigned alignments: 0 (0%)
Number of ambiguous groups: 9370
Number of ambiguous V-genes: 38
Number of ambiguous J-genes: 19
Number of ambiguous tag+V/J-gene combinations: 57
Ignored non-productive alignments: 0 (0%)
Unassigned alignments: 85043 (4.71%)

Hi, so the report looks fine, and most likely the results you get are accurate. UMIs help to reduce artificial diversity by correcting the errors, so it is only logical that you see a lower number of clones. My guess is the top clonotypes are the same, and the ones that were correct are the singletons.