milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.

Home Page:https://mixcr.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Difference in vs4.6 vs 4.3 for TCR data using TAKARA human v2 kit - 20 vs 16824 clonotzpe counts

Manishaa17993 opened this issue · comments

Checklist before submitting the issue:

  • [x ] The issue is strongly related to the MiXCR software
  • [ x] The issue can be reproduced with the most recent version of MiXCR
  • [ x] There is no answer to the question in the official documentation and there is no duplicate issue in the bug tracker
  • [x ] Inspection of raw alignments with exportAlignmentsPretty shows that data has the expected architecture, and sample preparation artefacts are not the reason of the problem (if this is the matter of the issue)

Expected Result

Same/similar TCR clonotzpes from both version of v4.6 and v4.3

Actual Result

From vs 4.6 - Final clonotype count: 20
From vs 4.3 - Final clonotype count: 16824

Exact MiXCR commands

For vs 4.6 -
mixcr -Xmx20g analyze takara-human-rna-tcr-umi-smarter-v2 /L141108_Track-186360_R1.fastq.gz /L141108_Track-186360_R2.fastq.gz /BC001_1_17

For vs 4.3 -
mixcr -Xmx20g analyze takara-human-tcr-V2-cdr3 /L141108_Track-186360_R1.fastq.gz /L141108_Track-186360_R2.fastq.gz /BC001_1_17

MiXCR report files

From vs4.6 -
Align report

Analysis date: Sat Feb 24 15:32:29 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz,/home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Version: 4.6.0; built=Sat Dec 09 20:48:42 CET 2023; rev=c9fafa41fe; lib=repseqio.v4.0
Command line arguments: align --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.json --preset takara-human-rna-tcr-umi-smarter-v2 --save-output-file-names /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.list /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Analysis time: 125.53m
Total sequencing reads: 119708193
Successfully aligned reads: 57232369 (47.81%)
Coverage (percent of successfully aligned):
  CDR3: 54730213 (95.63%)
  FR3_TO_FR4: 11728 (0.02%)
  CDR2_TO_FR4: 2574 (0%)
  FR2_TO_FR4: 7031 (0.01%)
  CDR1_TO_FR4: 5469 (0.01%)
  VDJRegion: 5343 (0.01%)
Alignment failed: no hits (not TCR/IG?): 21550331 (18%)
Alignment failed after alignment-aided overlap: 195661 (0.16%)
Alignment failed: absence of V hits: 11750341 (9.82%)
Alignment failed: absence of J hits: 1005092 (0.84%)
Alignment failed: no target with both V and J alignments: 27974399 (23.37%)
Overlapped: 22267748 (18.6%)
Overlapped and aligned: 3270796 (2.73%)
Overlapped and not aligned: 18996952 (15.87%)
Alignment-aided overlaps, percent of overlapped and aligned: 0 (0%)
No CDR3 parts alignments, percent of successfully aligned: 6260 (0.01%)
Partial aligned reads, percent of successfully aligned: 2495896 (4.36%)
V gene chimeras: 12198896 (10.19%)
J gene chimeras: 2138 (0%)
Paired-end alignment conflicts eliminated: 1264 (0%)
Realigned with forced non-floating bound: 97440445 (81.4%)
Realigned with forced non-floating right bound in left read: 888575 (0.74%)
Realigned with forced non-floating left bound in right read: 888575 (0.74%)
TRA chains: 6182072 (10.8%)
TRA non-functional: 705683 (11.41%)
TRB chains: 51050297 (89.2%)
TRB non-functional: 738950 (1.45%)
Trimming report:
  R1 reads trimmed left: 24889 (0.02%)
  R1 reads trimmed right: 3 (0%)
  Average R1 nucleotides trimmed left: 6.293387120128027E-4
  Average R1 nucleotides trimmed right: 2.840240015986207E-7
  R2 reads trimmed left: 18 (0%)
  R2 reads trimmed right: 4 (0%)
  Average R2 nucleotides trimmed left: 1.8962778930260856E-6
  Average R2 nucleotides trimmed right: 6.181698858322922E-7
Tag parsing report:
  Execution time: 0ns
  Total reads: 119708193
  Matched reads: 119708193 (100%)
  Projection +R1 +R2: 119708193 (100%)
  For variant 0:
    For projection +R1 +R2:
      R1:Left position: 5
      R1:Right position: 101
      UMI:Left position: 0
      UMI:Right position: 12
      R2:Left position: 22
      Variants: 0
      Cost: 0
      R1 length: 96
      UMI length: 12
      R2 length: 79
======================================

Assemble report

Analysis date: Sat Feb 24 18:23:23 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Version: 4.6.0; built=Sat Dec 09 20:48:42 CET 2023; rev=c9fafa41fe; lib=repseqio.v4.0
Command line arguments: assemble --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.json /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Analysis time: 8.6m
Final clonotype count: 20
Reads used in clonotypes, percent of total: 80962 (0.07%)
Average number of reads per clonotype: 4048.1
Reads dropped due to the lack of a clone sequence, percent of total: 55564486 (46.42%)
Reads dropped due to a too short clonal sequence, percent of total: 0 (0%)
Reads dropped due to low quality, percent of total: 0 (0%)
Reads dropped due to failed mapping, percent of total: 57660 (0.05%)
Reads dropped with low quality clones, percent of total: 0 (0%)
Aligned reads processed: 138622
Reads used in clonotypes before clustering, percent of total: 80962 (0.07%)
Number of reads used as a core, percent of used: 78851 (97.39%)
Mapped low quality reads, percent of used: 2111 (2.61%)
Reads clustered in PCR error correction, percent of used: 0 (0%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)
Clonotypes dropped as low quality: 0
Clonotypes eliminated by PCR error correction: 0
Clonotypes pre-clustered due to the similar VJC-lists: 0
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
TRA chains: 1 (5%)
TRA non-functional: 0 (0%)
TRB chains: 19 (95%)
TRB non-functional: 0 (0%)
Pre-clone assembler report:
  Number of input groups: 83788
  Number of input groups with no assembling feature: 83615
  Number of input alignments: 55569657
  Number of alignments with assembling feature: 5171 (0.01%)
  Number of output pre-clones: 173
  Number of pre-clonotypes per group:   1
  Number of assembling feature sequences in groups with zero pre-clonotypes: 0
  Number of dropped pre-clones by tag suffix conflict: 0
  Number of dropped alignments by tag suffix conflict: 0
  Number of core alignments: 5165 (0.01%)
  Discarded core alignments: 6 (0.12%)
  Empirically assigned alignments: 133457 (0.24%)
  Empirical assignment conflicts: 0 (0%)
  Tag+VJ-gene empirically assigned alignments: 133457 (0.24%)
  VJ-gene empirically assigned alignments: 0 (0%)
  Tag empirically assigned alignments: 0 (0%)
  Number of ambiguous groups: 0
  Number of ambiguous tag+V/J-gene combinations: 0
  Ignored non-productive alignments: 0 (0%)
  Unassigned alignments: 102605 (0.18%)
======================================  

From vs4.3 -
Align report

Analysis date: Wed Feb 28 10:17:21 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz,/home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Version: 4.3.0; built=Fri Mar 17 17:26:47 CET 2023; rev=96be4ef48c; lib=repseqio.v2.2
Command line arguments: align --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.align.report.json --preset takara-human-tcr-V2-cdr3 /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R1.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/Decrypted/L141108_Track-186360_R2.fastq.gz /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.vdjca
Analysis time: 334.92m
Total sequencing reads: 119708193
Successfully aligned reads: 70906641 (59.23%)
Alignment failed: no hits (not TCR/IG?): 21600723 (18.04%)
Alignment failed: absence of V hits: 9404786 (7.86%)
Alignment failed: absence of J hits: 990776 (0.83%)
Alignment failed: no target with both V and J alignments: 16805157 (14.04%)
Alignment failed: low total score: 110 (0%)
Overlapped: 22335869 (18.66%)
Overlapped and aligned: 3254509 (2.72%)
Overlapped and not aligned: 19081360 (15.94%)
Alignment-aided overlaps, percent of overlapped and aligned: 68124 (2.09%)
No CDR3 parts alignments, percent of successfully aligned: 40310 (0.06%)
Partial aligned reads, percent of successfully aligned: 10176383 (14.35%)
Chimeras: 1 (0%)
V gene chimeras: 117516 (0.1%)
J gene chimeras: 2070 (0%)
Paired-end alignment conflicts eliminated: 235925 (0.2%)
Realigned with forced non-floating bound: 194880896 (162.8%)
Realigned with forced non-floating right bound in left read: 2507915 (2.1%)
Realigned with forced non-floating left bound in right read: 2507915 (2.1%)
TRA chains: 8368386 (11.8%)
TRA non-functional: 880788 (10.53%)
TRB chains: 62538254 (88.2%)
TRB non-functional: 868466 (1.39%)
Trimming report:
  R1 reads trimmed left: 24889 (0.02%)
  R1 reads trimmed right: 3 (0%)
  Average R1 nucleotides trimmed left: 6.293387120128027E-4
  Average R1 nucleotides trimmed right: 2.840240015986207E-7
  R2 reads trimmed left: 18 (0%)
  R2 reads trimmed right: 4 (0%)
  Average R2 nucleotides trimmed left: 1.8962778930260856E-6
  Average R2 nucleotides trimmed right: 6.181698858322922E-7
Tag parsing report:
  Execution time: 0ns
  Total reads: 119708193
  Matched reads: 119708193 (100%)
  Projection +R1 +R2: 119708193 (100%)
  For variant 0:
    For projection [1, 2]:
      R1:Left position: 5
      R1:Right position: 101
      UMI:Left position: 0
      UMI:Right position: 12
      R2:Left position: 22
      Variants: 0
      Cost: 0
      R1 length: 96
      UMI length: 12
      R2 length: 79
======================================

Assemble report

Analysis date: Wed Feb 28 16:46:24 CET 2024
Input file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca
Output file(s): /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Version: 4.3.0; built=Fri Mar 17 17:26:47 CET 2023; rev=96be4ef48c; lib=repseqio.v2.2
Command line arguments: assemble --report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.txt --json-report /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.assemble.report.json /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.refined.vdjca /home/crtd_bonifacio/mago517b/groups/crtd_bonifacio/Manisha/Other_projects/Parse/Part1_Bulk_Takara/MiXCR_all/BC001_1_17.clns
Analysis time: 178.36m
Final clonotype count: 16824
Reads used in clonotypes, percent of total: 56218760 (46.96%)
Average number of reads per clonotype: 3341.58
Reads dropped due to the lack of a clone sequence, percent of total: 10080315 (8.42%)
Reads dropped due to a too short clonal sequence, percent of total: 8812 (0.01%)
Reads dropped due to low quality, percent of total: 47 (0%)
Reads dropped due to failed mapping, percent of total: 9847 (0.01%)
Reads dropped with low quality clones, percent of total: 233227 (0.19%)
Reads used in clonotypes before clustering, percent of total: 56266257 (47%)
Number of reads used as a core, percent of used: 56262726 (99.99%)
Mapped low quality reads, percent of used: 3531 (0.01%)
Reads clustered in PCR error correction, percent of used: 47497 (0.08%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 134738 (0.24%)
Clonotypes dropped as low quality: 47
Clonotypes eliminated by PCR error correction: 64
Clonotypes pre-clustered due to the similar VJC-lists: 361
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
TRA chains: 2432 (14.46%)
TRA non-functional: 369 (15.17%)
TRB chains: 14392 (85.54%)
TRB non-functional: 372 (2.58%)
Pre-clone assembler report:
  Number of input groups: 115772
  Number of input groups with no assembling feature: 877
  Number of input alignments: 68849948
  Number of alignments with assembling feature: 58769633 (85.36%)
  Number of output pre-clones: 105162
  Number of pre-clonotypes per group:  
    0: + 16441 (14.31%) = 16441 (14.31%)
    1: + 92169 (80.22%) = 108610 (94.53%)
    2: + 5862 (5.1%) = 114472 (99.63%)
    3: + 423 (0.37%) = 114895 (100%)
  Number of assembling feature sequences in groups with zero pre-clonotypes: 853657
  Number of dropped pre-clones by tag suffix conflict: 0
  Number of dropped alignments by tag suffix conflict: 0
  Number of core alignments: 55889867 (81.18%)
  Discarded core alignments: 2879766 (5.15%)
  Empirically assigned alignments: 628276 (0.91%)
  Empirical assignment conflicts: 2115 (0%)
  Tag+VJ-gene empirically assigned alignments: 630391 (0.92%)
  VJ-gene empirically assigned alignments: 0 (0%)
  Tag empirically assigned alignments: 0 (0%)
  Number of ambiguous groups: 6285
  Number of ambiguous V-genes: 621
  Number of ambiguous J-genes: 977
  Number of ambiguous tag+V/J-gene combinations: 1598
  Ignored non-productive alignments: 0 (0%)
  Unassigned alignments: 12308386 (17.88%)
======================================

Hi, MiXCR v4.6 by default assembles clones by VDJRegion for this protocol, which requires 300+300 sequencing. In your your case with shorter reads you should add the following parameter to mixcr analyze command:

--assemble-clonotypes-by CDR3