pFindStudio / pDeep3

MS/MS prediction for peptides

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

help using from command line

kevinkovalchik opened this issue · comments

Hello,

There seems to be an issue using tune_and_predict from the command line. The readme suggests this usage:

python -m pDeep.cmd.tune_and_predict tmp/predict/pDeep-tune.cfg

The program then starts up and makes some predictions, but they aren't coming from the config file. The code at the end of tune_and_predict is this:

    if __name__ == "__main__":
        input_peptides = [('ACDMNLK', '2,Carbamidomethyl[C];4,Oxidation[M]', 3)]
        ion_types = ['b','y', 'c', 'z']
        # prediction = get_prediction(input_peptides, tune_psm=r"e:\DDATools\MaxQuant_1.6.12.0\test_data\combined\txt\evidence.txt", raw=r"e:\DDATools\MaxQuant_1.6.12.0\test_data\20141010_DIA_20x5mz_700to800.raw")
        prediction = get_prediction(input_peptides, model="EThcD")
        ion_indices, used_ion_types = prediction.GetIonTypeIndices(ion_types)
        print(used_ion_types)
        print(prediction.GetIntensitiesByIndices(*input_peptides[0], ion_indices))

So the config file isn't actually being used. Everything is hard-coded in there to make a prediction for a specific peptide.

Is there a better entry point somewhere in the package for the command line? Or at this point is it necessary to write a script which calls the functions directly?

Thanks!

Kevin

Thanks! I will try it out from there.

What format is expected for a .peplib input file? I have tried the format in this file: https://github.com/pFindStudio/pDeep/blob/master/pDeep2/peptide.txt, but I am getting this output:

$ python -m pDeep.cmd.generate_predicted_speclib --input /Data/Analysis/pDeep3_prep/test/input_peps.peplib --output /Data/Analysis/pDeep3_prep/test/output/predicted/test_out.tsv --varmod "" --fixmod ""
tensorflow version = 2.2.1
Namespace(RT_input=None, RT_model='', RT_proteins=None, RT_tsv=None, ce=27, decoy='reverse', fixmod='', grid_ins_ce_search=0, input='/Data/Analysis/pDeep3_prep/test/input_peps.peplib', instrument='QE', ion_type='b,y,b-ModLoss,y-ModLoss', least_n_peaks=6, max_miss_cleave=2, max_mz=2000, max_peptide_length=60, max_precursor_charge=4, max_precursor_mz=1200, max_varmod=1, min_intensity=0.1, min_mz=300, min_peptide_length=6, min_precursor_charge=2, min_precursor_mz=400, min_varmod=0, model='HCD', n_tune_psm=1000, output='/Data/Analysis/pDeep3_prep/test/output/predicted/test_out.tsv', psmRT='', psmlabel='', raw=None, spikein=None, spikein_fixmod='Carbamidomethyl[C]', spikein_proteins=None, spikein_varmod='Oxidation[M]', target_mod='', target_mod_max=0, target_mod_min=0, target_proteins=None, tune_psm=None, varmod='')
Generated 0 peptides
Generated 0 precursors (charge: 2 to 4, m/z: 400 to 1200)
[pDeep Info] fix modifications included: ''
[pDeep Info] var modifications included: ''
[pDeep Info] target modifications included: ''
[pDeep Info] model = /Data/Analysis/pDeep3_prep/pDeep3-master/pDeep/tmp/model/pretrain-tf2-200412.ckpt
[pDeep Info] RT model = /Data/Analysis/pDeep3_prep/pDeep3-master/pDeep/tmp/model/RT-model.ckpt
[pDeep Info] encoding peptides ...
[pDeep Warn] 0 precursors are ignored due to invalid AAs (i.e. BJOUXZ)
[pDeep Info] encoding time = 0.009 seconds
[pDeep Info] predicting ...
[pDeep Info] predicted 0 peptide precursors using 0.000 seconds
[pDeep Info] predicting time = 0.000 seconds
[pDeep Info] predicting RT ...
[pDeep Info] predicted 0 peptide precursors (RT) using 0.000 seconds
[pDeep Info] predicting time = 0.000s
[pDeep Info] updating tsv ...
[TSV UPDATE] 100%: /Data/Analysis/pDeep3_prep/test/output/predicted/test_out.tsv
[pDeep Info] updating tsv time = 0.000s

So no peptides seem to be getting loaded and the output file is empty. I see in this file only peptide and protein columns, but I don't see how that could be a useful input without the charge state.

I am also running into an exception if I don't clear the defaults for the modifications (which you see in my command above), but I am thinking this is because of the input file format. If it isn't resolved by fixing my input I'll open a new issue for that.

Thanks!

Hi @kevinkovalchik, https://github.com/pFindStudio/pDeep/blob/master/pDeep2/peptide.txt should be renamed as 'xxx.modseq.txt', not '.peplib'.

For .peplib, pDeep3 will add different charges (2-4 by default) to each peptide.

There is probably a bug for peplib format, I will check it soon

Hi @jalew188. Sorry for not responding. Yes, it did work to rename the input file.

If I wanted to try training a new model from scratch, would I make my own script based on train.py? Or is there a CLI for this somewhere in the project?

@kevinkovalchik You have to use train.py in this case

Hi @jalew188. Sorry for not responding. Yes, it did work to rename the input file.

If I wanted to try training a new model from scratch, would I make my own script based on train.py? Or is there a CLI for this somewhere in the project?