File input

Question

File input

mdieser opened this issue 5 years ago · comments

mdieser commented 5 years ago

Hi - Thanks for making this pipeline available! Quick question, are .gbk files the only accepted input file format?

Thanks for your time and have a great weekend,

Markus

Richard Neher · Answer 1 · Wed Feb 13 2019 04:34:39 GMT+0800 (China Standard Time)

Yes, currently we only support gbk. wouldn't be hard to relax this, but I don't have time right now.

richard

mdieser · Answer 2 · Thu Feb 14 2019 05:19:49 GMT+0800 (China Standard Time)

Thanks for your response. Just an FYI, I'm trying to use my own .faa and .fna files. Currently, I'm trying to spin the sf_extract_sequences.py script to get to correct .cpk output files for step 3. Also, if I want to make my own strain_list file are entries in a column format or tab delineated? Or, are these attempts completely off the track?

Markus

mdieser · Answer 3 · Sat Feb 16 2019 12:41:53 GMT+0800 (China Standard Time)

Hi Richard - is there a way around gbk files that link to contigs rather than listing the genes and CDSs? example: https://www.ncbi.nlm.nih.gov/nuccore/AMWD00000000.1/

Richard Neher · Answer 4 · Sat Feb 16 2019 20:25:14 GMT+0800 (China Standard Time)

not sure. You can always reannotate whatever sequences you have with PROKKA or similar.

Wei Ding · Answer 5 · Tue Feb 19 2019 07:07:59 GMT+0800 (China Standard Time)

Thanks for your response. Just an FYI, I'm trying to use my own .faa and .fna files. Currently, I'm trying to spin the sf_extract_sequences.py script to get to correct .cpk output files for step 3. Also, if I want to make my own strain_list file are entries in a column format or tab delineated? Or, are these attempts completely off the track?

Markus

There is a parameter "-ngbk" in the pipeline for that purpose.
One can use nucleotide/amino acid sequence files (fna/faa) as input with that option, when GenBank files are not available.
For example:
./panX.py -ngbk -fn ./data/TestSet -sl TestSet -t 64 > TestSet.log
Using the "-ngbk" option means that there will be no functional annotation of CDS (from the GenBank file) in the final results.
Besides, if one has contig sequences, GenBank files can be generated by Prokka using those as input.

mdieser · Answer 6 · Thu Feb 21 2019 00:48:56 GMT+0800 (China Standard Time)

Both suggestions worked. Deeply appreciate all the support! Thank you