bvalot / pyMLST

whole genome MLST analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MLST import or create documentation

lskatz opened this issue · comments

Hi, I do not know what I'm doing wrong with this command. I have a database in ChewBBACA format with one locus per fasta file, and with many alleles in each locus.fasta file. How do I import it? Would appreciate some more extensive documentation and/or examples on this. Thank you!

(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ wgMLST import ../Salmonella_enterica.chewbbaca
Error: Database alreadly exists, use --force to override it

More info, if this helps

(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ ls -lh ../Salmonella_enterica.chewbbaca | head
total 4.3G
-rwxrwx---. 1 gzu2 users       7.6K May 13 09:00 INNUENDO_cgMLST-00031717.fasta*
-rwxrwx---. 1 gzu2 users        26K May 13 09:57 INNUENDO_cgMLST-00031718.fasta*
-rwxrwx---. 1 gzu2 users       2.9K May 31  2021 INNUENDO_cgMLST-00031719.fasta*
-rwxrwx---. 1 gzu2 users        34K May 13 05:48 INNUENDO_cgMLST-00031720.fasta*
-rwxrwx---. 1 gzu2 users       1.9K May 13 07:02 INNUENDO_cgMLST-00031721.fasta*
-rwxrwx---. 1 gzu2 users        20K May 13 00:14 INNUENDO_cgMLST-00031722.fasta*
-rwxrwx---. 1 gzu2 users       5.7K May 13 12:06 INNUENDO_cgMLST-00031723.fasta*
-rwxrwx---. 1 gzu2 users       5.8K May 12 23:31 INNUENDO_cgMLST-00031724.fasta*
-rwxrwx---. 1 gzu2 users       7.9K May 12 19:36 INNUENDO_cgMLST-00031725.fasta*
(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ tree -d ../Salmonella_enterica.chewbbaca
../Salmonella_enterica.chewbbaca
└── short

1 directory
(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ grep -m 3 ">" ../Salmonella_enterica.chewbbaca/INNUENDO_cgMLST-00031717.fasta
>INNUENDO_cgMLST-00031717_1
>INNUENDO_cgMLST-00031717_2
>INNUENDO_cgMLST-00031717_3

Hello,

PyMLST doesn't use chewbacca database. You need to create new ones.
Here, you can for exemple create a new cgMLST database for Salmonella_enterica with this command:

wgMLST import Salmonella_enterica.pymlstdb Salmonella enterica

That would create the cgMLST database from cgmlst.org. Then you can add your strain you want to type with the add command.

Thank you! I will try that! Could you also give an example command(s) on how to create a local database too?

I seem to still have an error. I don't know if it's my firewall and so how can I troubleshoot it?

(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ wgMLST import Salmonella_enterica.pymlstdb Salmonella enterica
Error: Could not access to the server, please verify your internet connection

Very strange.
It's seems a problem with you internet connection.
Can you access to this web site using your browser:
https://www.cgmlst.org/ncs

Otherwise you can try to create a local database using a current schema. For this purpose, you need one fasta file containing the different genes of the schema but with only one allele for each in comparison to chewBacca that contains all alleles.
wgMLST create Salmonella_enterica.pymlstdb genes.fasta

Yes I'm able to get to that site with lynx https://www.cgmlst.org/ncs. I think that sometimes our firewall is funny though and we cannot access ftp sites.

Can I create a local database if I have the full fasta files with all alleles?

No you need only one fasta file with only one allele by gene. It's was quiet easy to python script that from you chewbacca files

Ok I think I understand that. But if I import only one allele per locus, then how do I call other alleles with the new database? Wouldn't I need other alleles in the database?

No, you don't need because the database would be automatically extends with the alleles found in your strains

Got it, thanks! I'll try this out next chance I get and so I'll close out this ticket for now.

Thanks! It works now! I was able to query genomes with

(set -e; 
  for i in illumina/Salm/validation-dataset/shovill.out/*_1.shovillSpades.fasta;  do 
  b=$(basename $i _1.shovillSpades.fasta); 
  wgMLST add --strain $b MLST.db/Salmonella_enterica.pyMLST/Salmonella_enterica.pymlstdb $i; 
done;)

In some instances I added a genome twice which broke my loop. So it might be useful to have a function to check whether a strain name has already been added to the database.

There is one normally, that prompt you that you have already a strain in the database.

You can also remove strains witth "remove" command

Okay cool, thanks!