Difference in vs4.6 vs 4.3 for TCR data using TAKARA human v2 kit - 20 vs 16824 clonotzpe counts
Manishaa17993 opened this issue · comments
Checklist before submitting the issue:
- [x ] The issue is strongly related to the MiXCR software
- [ x] The issue can be reproduced with the most recent version of MiXCR
- [ x] There is no answer to the question in the official documentation and there is no duplicate issue in the bug tracker
- [x ] Inspection of raw alignments with exportAlignmentsPretty shows that data has the expected architecture, and sample preparation artefacts are not the reason of the problem (if this is the matter of the issue)
Expected Result
Same/similar TCR clonotzpes from both version of v4.6 and v4.3
Actual Result
From vs 4.6 - Final clonotype count: 20
From vs 4.3 - Final clonotype count: 16824
Exact MiXCR commands
For vs 4.6 -
mixcr -Xmx20g analyze takara-human-rna-tcr-umi-smarter-v2 /L141108_Track-186360_R1.fastq.gz /L141108_Track-186360_R2.fastq.gz /BC001_1_17
For vs 4.3 -
mixcr -Xmx20g analyze takara-human-tcr-V2-cdr3 /L141108_Track-186360_R1.fastq.gz /L141108_Track-186360_R2.fastq.gz /BC001_1_17
MiXCR report files
From vs4.6 -
Align report
Analysis date: Sat Feb 24 15:32:29 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz,/home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Version: 4.6.0; built=Sat Dec 09 20:48:42 CET 2023; rev=c9fafa41fe; lib=repseqio.v4.0
Command line arguments: align --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.json --preset takara-human-rna-tcr-umi-smarter-v2 --save-output-file-names /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.list /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Analysis time: 125.53m
Total sequencing reads: 119708193
Successfully aligned reads: 57232369 (47.81%)
Coverage (percent of successfully aligned):
CDR3: 54730213 (95.63%)
FR3_TO_FR4: 11728 (0.02%)
CDR2_TO_FR4: 2574 (0%)
FR2_TO_FR4: 7031 (0.01%)
CDR1_TO_FR4: 5469 (0.01%)
VDJRegion: 5343 (0.01%)
Alignment failed: no hits (not TCR/IG?): 21550331 (18%)
Alignment failed after alignment-aided overlap: 195661 (0.16%)
Alignment failed: absence of V hits: 11750341 (9.82%)
Alignment failed: absence of J hits: 1005092 (0.84%)
Alignment failed: no target with both V and J alignments: 27974399 (23.37%)
Overlapped: 22267748 (18.6%)
Overlapped and aligned: 3270796 (2.73%)
Overlapped and not aligned: 18996952 (15.87%)
Alignment-aided overlaps, percent of overlapped and aligned: 0 (0%)
No CDR3 parts alignments, percent of successfully aligned: 6260 (0.01%)
Partial aligned reads, percent of successfully aligned: 2495896 (4.36%)
V gene chimeras: 12198896 (10.19%)
J gene chimeras: 2138 (0%)
Paired-end alignment conflicts eliminated: 1264 (0%)
Realigned with forced non-floating bound: 97440445 (81.4%)
Realigned with forced non-floating right bound in left read: 888575 (0.74%)
Realigned with forced non-floating left bound in right read: 888575 (0.74%)
TRA chains: 6182072 (10.8%)
TRA non-functional: 705683 (11.41%)
TRB chains: 51050297 (89.2%)
TRB non-functional: 738950 (1.45%)
Trimming report:
R1 reads trimmed left: 24889 (0.02%)
R1 reads trimmed right: 3 (0%)
Average R1 nucleotides trimmed left: 6.293387120128027E-4
Average R1 nucleotides trimmed right: 2.840240015986207E-7
R2 reads trimmed left: 18 (0%)
R2 reads trimmed right: 4 (0%)
Average R2 nucleotides trimmed left: 1.8962778930260856E-6
Average R2 nucleotides trimmed right: 6.181698858322922E-7
Tag parsing report:
Execution time: 0ns
Total reads: 119708193
Matched reads: 119708193 (100%)
Projection +R1 +R2: 119708193 (100%)
For variant 0:
For projection +R1 +R2:
R1:Left position: 5
R1:Right position: 101
UMI:Left position: 0
UMI:Right position: 12
R2:Left position: 22
Variants: 0
Cost: 0
R1 length: 96
UMI length: 12
R2 length: 79
======================================
Assemble report
Analysis date: Sat Feb 24 18:23:23 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Version: 4.6.0; built=Sat Dec 09 20:48:42 CET 2023; rev=c9fafa41fe; lib=repseqio.v4.0
Command line arguments: assemble --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.json /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Analysis time: 8.6m
Final clonotype count: 20
Reads used in clonotypes, percent of total: 80962 (0.07%)
Average number of reads per clonotype: 4048.1
Reads dropped due to the lack of a clone sequence, percent of total: 55564486 (46.42%)
Reads dropped due to a too short clonal sequence, percent of total: 0 (0%)
Reads dropped due to low quality, percent of total: 0 (0%)
Reads dropped due to failed mapping, percent of total: 57660 (0.05%)
Reads dropped with low quality clones, percent of total: 0 (0%)
Aligned reads processed: 138622
Reads used in clonotypes before clustering, percent of total: 80962 (0.07%)
Number of reads used as a core, percent of used: 78851 (97.39%)
Mapped low quality reads, percent of used: 2111 (2.61%)
Reads clustered in PCR error correction, percent of used: 0 (0%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)
Clonotypes dropped as low quality: 0
Clonotypes eliminated by PCR error correction: 0
Clonotypes pre-clustered due to the similar VJC-lists: 0
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
TRA chains: 1 (5%)
TRA non-functional: 0 (0%)
TRB chains: 19 (95%)
TRB non-functional: 0 (0%)
Pre-clone assembler report:
Number of input groups: 83788
Number of input groups with no assembling feature: 83615
Number of input alignments: 55569657
Number of alignments with assembling feature: 5171 (0.01%)
Number of output pre-clones: 173
Number of pre-clonotypes per group: 1
Number of assembling feature sequences in groups with zero pre-clonotypes: 0
Number of dropped pre-clones by tag suffix conflict: 0
Number of dropped alignments by tag suffix conflict: 0
Number of core alignments: 5165 (0.01%)
Discarded core alignments: 6 (0.12%)
Empirically assigned alignments: 133457 (0.24%)
Empirical assignment conflicts: 0 (0%)
Tag+VJ-gene empirically assigned alignments: 133457 (0.24%)
VJ-gene empirically assigned alignments: 0 (0%)
Tag empirically assigned alignments: 0 (0%)
Number of ambiguous groups: 0
Number of ambiguous tag+V/J-gene combinations: 0
Ignored non-productive alignments: 0 (0%)
Unassigned alignments: 102605 (0.18%)
======================================
From vs4.3 -
Align report
Analysis date: Wed Feb 28 10:17:21 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz,/home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Version: 4.3.0; built=Fri Mar 17 17:26:47 CET 2023; rev=96be4ef48c; lib=repseqio.v2.2
Command line arguments: align --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.json --preset takara-human-tcr-V2-cdr3 /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Analysis time: 334.92m
Total sequencing reads: 119708193
Successfully aligned reads: 70906641 (59.23%)
Alignment failed: no hits (not TCR/IG?): 21600723 (18.04%)
Alignment failed: absence of V hits: 9404786 (7.86%)
Alignment failed: absence of J hits: 990776 (0.83%)
Alignment failed: no target with both V and J alignments: 16805157 (14.04%)
Alignment failed: low total score: 110 (0%)
Overlapped: 22335869 (18.66%)
Overlapped and aligned: 3254509 (2.72%)
Overlapped and not aligned: 19081360 (15.94%)
Alignment-aided overlaps, percent of overlapped and aligned: 68124 (2.09%)
No CDR3 parts alignments, percent of successfully aligned: 40310 (0.06%)
Partial aligned reads, percent of successfully aligned: 10176383 (14.35%)
Chimeras: 1 (0%)
V gene chimeras: 117516 (0.1%)
J gene chimeras: 2070 (0%)
Paired-end alignment conflicts eliminated: 235925 (0.2%)
Realigned with forced non-floating bound: 194880896 (162.8%)
Realigned with forced non-floating right bound in left read: 2507915 (2.1%)
Realigned with forced non-floating left bound in right read: 2507915 (2.1%)
TRA chains: 8368386 (11.8%)
TRA non-functional: 880788 (10.53%)
TRB chains: 62538254 (88.2%)
TRB non-functional: 868466 (1.39%)
Trimming report:
R1 reads trimmed left: 24889 (0.02%)
R1 reads trimmed right: 3 (0%)
Average R1 nucleotides trimmed left: 6.293387120128027E-4
Average R1 nucleotides trimmed right: 2.840240015986207E-7
R2 reads trimmed left: 18 (0%)
R2 reads trimmed right: 4 (0%)
Average R2 nucleotides trimmed left: 1.8962778930260856E-6
Average R2 nucleotides trimmed right: 6.181698858322922E-7
Tag parsing report:
Execution time: 0ns
Total reads: 119708193
Matched reads: 119708193 (100%)
Projection +R1 +R2: 119708193 (100%)
For variant 0:
For projection [1, 2]:
R1:Left position: 5
R1:Right position: 101
UMI:Left position: 0
UMI:Right position: 12
R2:Left position: 22
Variants: 0
Cost: 0
R1 length: 96
UMI length: 12
R2 length: 79
======================================
Assemble report
Analysis date: Wed Feb 28 16:46:24 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Version: 4.3.0; built=Fri Mar 17 17:26:47 CET 2023; rev=96be4ef48c; lib=repseqio.v2.2
Command line arguments: assemble --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.json /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Analysis time: 178.36m
Final clonotype count: 16824
Reads used in clonotypes, percent of total: 56218760 (46.96%)
Average number of reads per clonotype: 3341.58
Reads dropped due to the lack of a clone sequence, percent of total: 10080315 (8.42%)
Reads dropped due to a too short clonal sequence, percent of total: 8812 (0.01%)
Reads dropped due to low quality, percent of total: 47 (0%)
Reads dropped due to failed mapping, percent of total: 9847 (0.01%)
Reads dropped with low quality clones, percent of total: 233227 (0.19%)
Reads used in clonotypes before clustering, percent of total: 56266257 (47%)
Number of reads used as a core, percent of used: 56262726 (99.99%)
Mapped low quality reads, percent of used: 3531 (0.01%)
Reads clustered in PCR error correction, percent of used: 47497 (0.08%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 134738 (0.24%)
Clonotypes dropped as low quality: 47
Clonotypes eliminated by PCR error correction: 64
Clonotypes pre-clustered due to the similar VJC-lists: 361
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
TRA chains: 2432 (14.46%)
TRA non-functional: 369 (15.17%)
TRB chains: 14392 (85.54%)
TRB non-functional: 372 (2.58%)
Pre-clone assembler report:
Number of input groups: 115772
Number of input groups with no assembling feature: 877
Number of input alignments: 68849948
Number of alignments with assembling feature: 58769633 (85.36%)
Number of output pre-clones: 105162
Number of pre-clonotypes per group:
0: + 16441 (14.31%) = 16441 (14.31%)
1: + 92169 (80.22%) = 108610 (94.53%)
2: + 5862 (5.1%) = 114472 (99.63%)
3: + 423 (0.37%) = 114895 (100%)
Number of assembling feature sequences in groups with zero pre-clonotypes: 853657
Number of dropped pre-clones by tag suffix conflict: 0
Number of dropped alignments by tag suffix conflict: 0
Number of core alignments: 55889867 (81.18%)
Discarded core alignments: 2879766 (5.15%)
Empirically assigned alignments: 628276 (0.91%)
Empirical assignment conflicts: 2115 (0%)
Tag+VJ-gene empirically assigned alignments: 630391 (0.92%)
VJ-gene empirically assigned alignments: 0 (0%)
Tag empirically assigned alignments: 0 (0%)
Number of ambiguous groups: 6285
Number of ambiguous V-genes: 621
Number of ambiguous J-genes: 977
Number of ambiguous tag+V/J-gene combinations: 1598
Ignored non-productive alignments: 0 (0%)
Unassigned alignments: 12308386 (17.88%)
======================================
Hi, MiXCR v4.6 by default assembles clones by VDJRegion
for this protocol, which requires 300+300 sequencing. In your your case with shorter reads you should add the following parameter to mixcr analyze
command:
--assemble-clonotypes-by CDR3