questions about output files (alignment & clonotype)

Question

questions about output files (alignment & clonotype)

silvia1234567890 opened this issue 7 months ago · comments

Hello,

I have a few questions about the output files of the alignment (alignments.vdjca) and clonotype (clonotype.clns) with MiXCR.
I take as input file for mixcr align a FASTA file with preprocessed sequences like this:

>AACCAGCAAATCACC_CONSCOUNT_134
ACAGCACGTCAGATTCAGCACAAA...
>ATACGCTATGCAACC_CONSCOUNT_119
ACAGCACGTCAGATTCAGCACAAA...
>GATAGCACTGGATGG_CONSCOUNT_96
ACAGCAGGTCAGATTCAGCACAAA...
>GGCTGAATTAACGAT_CONSCOUNT_93
GACAGCACGTCAGATTCAGCACAAA...

In the output.vdjca file (output file of mixcr align):

Is it possible to mantain the SeqID of the sequences I give in FASTA format?
How can I calculate the abundance of every sequence in the output.vdjca file?

Thank you in advance.
Best regards,
Silvia

mizraelson · Answer 1 · Fri Nov 03 2023 01:18:26 GMT+0800 (China Standard Time)

Hi,
Can you share the command you run?

silvia1234567890 · Answer 2 · Fri Nov 03 2023 18:49:32 GMT+0800 (China Standard Time)

Yes, of course.

For the alignment:

mixcr align --preset generic-amplicon \
  --library Smaximus-IGH.json.gz \
  --species Scophthalmus_maximus \
  --rna \
  --rigid-left-alignment-boundary \
  --rigid-right-alignment-boundary C \
  seqs.fasta \
  alignments.vdjca

For the clonotype:

mixcr assemble alignments.vdjca clones.clns

And to export the .vdjca and .clns files to tsv format:

mixcr exportAlignments alignments.vdjca alignments.tsv
mixcr exportClones clones.clns clones.tsv

And I would like to know in the alignments.tsv if:

Is it possible to mantain the SeqID of the seqs.fasta?
How can I calculate the abundance of every sequence in the alignments output file?

Thank you in advance.
Silvia

mizraelson · Answer 3 · Sat Nov 04 2023 03:35:53 GMT+0800 (China Standard Time)

Hi,

If you add -OsaveOriginalReads=true parameter for mixcr align command and then add -descrsR1 for mixcr exportAlignments, a column that displays the original read header for each alignment.
Could you please clarify? Alignments correspond to individual reads, and as such, they do not reflect abundance. It is only after the mixcr assemble step, once all corrections have been made, that we can assemble sequences into clones and determine their relative abundances.

Sincerely,
Mark

silvia1234567890 · Answer 4 · Mon Nov 06 2023 18:47:21 GMT+0800 (China Standard Time)

Hi,

for the first issue: for mixcr align I ran this code:

mixcr align --preset generic-amplicon \
> --library Scophthalmus_maximus-IGH.json \
> --species Scophthalmus_maximus \
> --rna \
> --rigid-left-alignment-boundary \
> --rigid-right-alignment-boundary C \
> -OsaveOriginalReads=false \
> seqs.fasta \
> output.vdjca

and it gave me this warning:

WARNING: unnecessary override -OsaveOriginalReads=false with the same value.

but the rest is ok. But for mixcr exportAlignments, I ran this code:

mixcr exportAlignments -descrsR1 output.vdjca alignments.tsv

and I'm getting this error:

Exporting alignments: 0%
Please copy the following information along with the stacktrace:
   Version: 4.5.0; built=Fri Sep 22 14:39:05 CEST 2023; rev=cdb24b4fb7; lib=repseqio.v3.0.1
        OS: Mac OS X
      Java: 21
  Cmd args: exportAlignments -descrsR1 output.vdjca alignments.tsv
picocli.CommandLine$ExecutionException: Error while running command exportAlignments java.lang.IllegalArgumentException: Error for option '-descrR1':
No description available for read: either re-run align action with -OsaveOriginalReads option or don't use '-descrR1' in exportAlignments
	at com.milaboratory.mixcr.cli.Main.registerExceptionHandlers$lambda-12(SourceFile:340)
	at picocli.CommandLine.execute(CommandLine.java:2088)
	at com.milaboratory.mixcr.cli.Main.main(SourceFile:98)
Caused by: java.lang.IllegalArgumentException: Error for option '-descrR1':
No description available for read: either re-run align action with -OsaveOriginalReads option or don't use '-descrR1' in exportAlignments
	at com.milaboratory.o.pF.invoke(SourceFile:1098)
	at com.milaboratory.o.kW.a(SourceFile:25)
	at com.milaboratory.o.lv.put(SourceFile:40)
	at cc.redberry.pipe.CUtils.drain(CUtils.java:82)
	at cc.redberry.pipe.util.PipeExtensionsKt.drainToAndClose(PipeExtensions.kt:155)
	at com.milaboratory.mixcr.cli.CommandExportAlignments$Cmd.run1(SourceFile:166)
	at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
	at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
	at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-26(SourceFile:447)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	... 1 more

Thank you in advance.
Silvia

mizraelson · Answer 5 · Mon Nov 06 2023 21:50:19 GMT+0800 (China Standard Time)

Sorry about that. Please use '-OsaveOriginalReads=true' instead.

silvia1234567890 · Answer 6 · Mon Nov 06 2023 22:58:27 GMT+0800 (China Standard Time)

It worked, thank you.

Regarding my second question, how can I determine the relative abundances after mixcr assemble step?

mizraelson · Answer 7 · Tue Nov 07 2023 01:34:45 GMT+0800 (China Standard Time)

For each clone in the output clonotype table you should see its' read cound and frequency. The frequency shows the fraction of your sample occupied by the clone.