vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parquet report mzML scan ID matching

ds2268 opened this issue · comments

I would like to match DIANN outputs back to mzML scans. I was previously using MS2.Scan, which to my understanding represented the consecutive order of the MS2 scans in mzML (which was still tedious to use). I saw that there is no MS2.Scan in DIANN 1.9. How can one map DIANN outputs back to mzML input?

Update: MS2.Scan is still available in tsv reports. Just to confirm: Is MS2.Scan the same as counting MS2 levels in mzML to get 1:1 mapping?

An additional question:

I run 1% precursor FDR (GUI) and 100% FDR DIA-NN runs and then computed unique peptides out:

peps_diann_1pct = set(df_diann_1pct["Stripped.Sequence"].unique().tolist())
len(peps_diann_1pct)

I get a number X

I then want to compute the same from 100% FDR run, by filtering with Q.Value:

peps_100pct = df_diann_100pct[df_diann_100pct["Q.Value"] < 0.01]
peps_100pct = set(peps_100pct["Stripped.Sequence"].unique().tolist())
len(peps_100pct)

I get Y, which is slightly more than X. Why isn't possible to get the same number?

Hi Dejan,

About MS2.Scan: yes, that's correct, counting all MS2 scans. Also, one good way to match is also match by RT.

About different numbers:
How do the logs look like in either case? Was MBR used?

Best,
Vadim

MBR was used yes. The difference is not big, but still (149,129 for X and 151,444 for Y). I have tried filtering by other Q value columns and none matched. I got a little less than X when using the global Q value.

I am attaching the log file for the 100% and 1% FDR runs.

report-100pct.log.txt
report-1pct.log.txt

Yes, MBR results will depend on the FDR filter set in GUI, this is expected. DIA-NN actually prints a warning if FDR is set higher than 5%.