vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inquiry on Distinguishing Alternative Isoforms with DIA-NN

Takwon-Yoo opened this issue · comments

First and foremost, I would like to thank you for developing the fantastic DIA-NN tool.

I am currently working on a project where I aim to distinguish and quantify isoforms produced by alternative splicing. To achieve this, I have included all isoform sequences in the FASTA file (example below). However, upon reviewing the 'report.pg.matrix.tsv' file, I noticed that all isoforms are grouped under a single protein group and share the same quantity value.

I am considering using the 'Precursor.Normalised' values from the 'report.parquet' file and summing them for each corresponding protein to achieve protein quantification. I would appreciate your thoughts on this approach or I'd like to know if there is another recommended method within DIA-NN to differentiate and quantify isoforms with different amino acid sequences.

Thank you for your time and assistance.

sp|NM_001347425|A2M GN=NM_001347425
MDENFHPLNELIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKK
SGGRTEHPFTVEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICR
KYSDASDCHGEDSQAFCEKFSGQLNSHGCFYQQVKTKVFQLKRKEYEMKLHTEAQIQEEG...

sp|NM_001347424|A2M GN=NM_001347424
MFLTVQVKGPTQEFKKRTTVMVKNEDSLVFVQTDKSIYKPGQTVKFRVVSMDENFHPLNE
LIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKKSGGRTEHPFT
VEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICRKYSDASDCHG...

I noticed that all isoforms are grouped under a single protein group and share the same quantity value.

Means no unique peptides are confidently detected.

If you check the main report, do you see any peptides uniquely mapped to a particular isoform in Protein.Ids column, at least for some runs?

Best,
Vadim

I realized that I had confused Protein.Ids and Protein.Group in the DIA-NN 1.8.1 'report.pg_matrix'.
I apologize for any confusion this may have caused.

However, I have a few questions that arose during my review;

  1. Theoretically, can proteins with the same name (isoforms due to alternative splicing) appear across multiple rows in the 'report.pg_matrix'?
  • Before analysis, please note that I removed precursors in the DIA-NN 1.9 report.parquet based on three criteria:
    Global.Q.Value < 0.01, Q.Value < 0.01, and Global.PG.Q.Value < 0.01.
  1. In the 'report.parquet', there were rows for XYLT1 as follows:

Proteotypic Protein.Ids Protein.Group Protein.Names Precursor.Quantity Q.Value Global.Q.Value PG.Q.Value Protein.Q.Value Global.PG.Q.Value Run
1 NM_022166 NM_022166 XYLT1 737.3844 0.000640615 0.001007388 0.007518797 0.00952381 0.009584664 GD002

However, there was no information about XYLT1 protein in the 'report.pg_matrix'. Could you explain why this discrepancy might occur?

  1. I observed that all precursors matching KRT85 also match KRT81, KRT83, and KRT86. Yet, the report.pg_matrix only contains information for KRT85, as shown in the example below:

Protein.Group Protein.Names Genes Sample1...
NM_002283 KRT85 NM_002283 882.988

Why might this be the case?

Thank you for your assistance.
Best regards,

Hi,

  1. If heuristic protein inference is selected - no.
    Can you please give some more detail on what you mean by 'before analysis'?
  2. I can take a look at both if you wish as well as the logs?
  3. DIA-NN uses all information it has, including non-confidence precursors IDs that don't get reported, to narrow down the list of proteins in the group. Also, non-swissprot proteins or proteins lacking gene annotation can be discarded from the protein group.

Best,
Vadim