Inquiry on Distinguishing Alternative Isoforms with DIA-NN
Takwon-Yoo opened this issue · comments
First and foremost, I would like to thank you for developing the fantastic DIA-NN tool.
I am currently working on a project where I aim to distinguish and quantify isoforms produced by alternative splicing. To achieve this, I have included all isoform sequences in the FASTA file (example below). However, upon reviewing the 'report.pg.matrix.tsv' file, I noticed that all isoforms are grouped under a single protein group and share the same quantity value.
I am considering using the 'Precursor.Normalised' values from the 'report.parquet' file and summing them for each corresponding protein to achieve protein quantification. I would appreciate your thoughts on this approach or I'd like to know if there is another recommended method within DIA-NN to differentiate and quantify isoforms with different amino acid sequences.
Thank you for your time and assistance.
sp|NM_001347425|A2M GN=NM_001347425
MDENFHPLNELIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKK
SGGRTEHPFTVEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICR
KYSDASDCHGEDSQAFCEKFSGQLNSHGCFYQQVKTKVFQLKRKEYEMKLHTEAQIQEEG...
sp|NM_001347424|A2M GN=NM_001347424
MFLTVQVKGPTQEFKKRTTVMVKNEDSLVFVQTDKSIYKPGQTVKFRVVSMDENFHPLNE
LIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKKSGGRTEHPFT
VEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICRKYSDASDCHG...
I noticed that all isoforms are grouped under a single protein group and share the same quantity value.
Means no unique peptides are confidently detected.
If you check the main report, do you see any peptides uniquely mapped to a particular isoform in Protein.Ids column, at least for some runs?
Best,
Vadim
I realized that I had confused Protein.Ids and Protein.Group in the DIA-NN 1.8.1 'report.pg_matrix'.
I apologize for any confusion this may have caused.
However, I have a few questions that arose during my review;
- Theoretically, can proteins with the same name (isoforms due to alternative splicing) appear across multiple rows in the 'report.pg_matrix'?
- Before analysis, please note that I removed precursors in the DIA-NN 1.9 report.parquet based on three criteria:
Global.Q.Value < 0.01, Q.Value < 0.01, and Global.PG.Q.Value < 0.01.
- In the 'report.parquet', there were rows for XYLT1 as follows:
Proteotypic Protein.Ids Protein.Group Protein.Names Precursor.Quantity Q.Value Global.Q.Value PG.Q.Value Protein.Q.Value Global.PG.Q.Value Run
1 NM_022166 NM_022166 XYLT1 737.3844 0.000640615 0.001007388 0.007518797 0.00952381 0.009584664 GD002
However, there was no information about XYLT1 protein in the 'report.pg_matrix'. Could you explain why this discrepancy might occur?
- I observed that all precursors matching KRT85 also match KRT81, KRT83, and KRT86. Yet, the report.pg_matrix only contains information for KRT85, as shown in the example below:
Protein.Group Protein.Names Genes Sample1...
NM_002283 KRT85 NM_002283 882.988
Why might this be the case?
Thank you for your assistance.
Best regards,
Hi,
- If heuristic protein inference is selected - no.
Can you please give some more detail on what you mean by 'before analysis'? - I can take a look at both if you wish as well as the logs?
- DIA-NN uses all information it has, including non-confidence precursors IDs that don't get reported, to narrow down the list of proteins in the group. Also, non-swissprot proteins or proteins lacking gene annotation can be discarded from the protein group.
Best,
Vadim