MLST import or create documentation

Question

MLST import or create documentation

lskatz opened this issue 2 years ago · comments

Hi, I do not know what I'm doing wrong with this command. I have a database in ChewBBACA format with one locus per fasta file, and with many alleles in each locus.fasta file. How do I import it? Would appreciate some more extensive documentation and/or examples on this. Thank you!

(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ wgMLST import ../Salmonella_enterica.chewbbaca
Error: Database alreadly exists, use --force to override it

Lee Katz · Answer 1 · Tue Jun 14 2022 04:04:48 GMT+0800 (China Standard Time)

More info, if this helps

(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ ls -lh ../Salmonella_enterica.chewbbaca | head
total 4.3G
-rwxrwx---. 1 gzu2 users       7.6K May 13 09:00 INNUENDO_cgMLST-00031717.fasta*
-rwxrwx---. 1 gzu2 users        26K May 13 09:57 INNUENDO_cgMLST-00031718.fasta*
-rwxrwx---. 1 gzu2 users       2.9K May 31  2021 INNUENDO_cgMLST-00031719.fasta*
-rwxrwx---. 1 gzu2 users        34K May 13 05:48 INNUENDO_cgMLST-00031720.fasta*
-rwxrwx---. 1 gzu2 users       1.9K May 13 07:02 INNUENDO_cgMLST-00031721.fasta*
-rwxrwx---. 1 gzu2 users        20K May 13 00:14 INNUENDO_cgMLST-00031722.fasta*
-rwxrwx---. 1 gzu2 users       5.7K May 13 12:06 INNUENDO_cgMLST-00031723.fasta*
-rwxrwx---. 1 gzu2 users       5.8K May 12 23:31 INNUENDO_cgMLST-00031724.fasta*
-rwxrwx---. 1 gzu2 users       7.9K May 12 19:36 INNUENDO_cgMLST-00031725.fasta*
(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ tree -d ../Salmonella_enterica.chewbbaca
../Salmonella_enterica.chewbbaca
└── short

1 directory
(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ grep -m 3 ">" ../Salmonella_enterica.chewbbaca/INNUENDO_cgMLST-00031717.fasta
>INNUENDO_cgMLST-00031717_1
>INNUENDO_cgMLST-00031717_2
>INNUENDO_cgMLST-00031717_3

bvalot · Answer 2 · Tue Jun 14 2022 15:04:14 GMT+0800 (China Standard Time)

Hello,

PyMLST doesn't use chewbacca database. You need to create new ones.
Here, you can for exemple create a new cgMLST database for Salmonella_enterica with this command:

wgMLST import Salmonella_enterica.pymlstdb Salmonella enterica

That would create the cgMLST database from cgmlst.org. Then you can add your strain you want to type with the add command.

Lee Katz · Answer 3 · Tue Jun 14 2022 20:51:47 GMT+0800 (China Standard Time)

Thank you! I will try that! Could you also give an example command(s) on how to create a local database too?

Lee Katz · Answer 4 · Wed Jun 15 2022 03:15:05 GMT+0800 (China Standard Time)

I seem to still have an error. I don't know if it's my firewall and so how can I troubleshoot it?

(pymlst) [gzu2@monolith3 Salmonella_enterica.pyMLST]$ wgMLST import Salmonella_enterica.pymlstdb Salmonella enterica
Error: Could not access to the server, please verify your internet connection

bvalot · Answer 5 · Wed Jun 15 2022 14:47:42 GMT+0800 (China Standard Time)

Very strange.
It's seems a problem with you internet connection.
Can you access to this web site using your browser:
https://www.cgmlst.org/ncs

Otherwise you can try to create a local database using a current schema. For this purpose, you need one fasta file containing the different genes of the schema but with only one allele for each in comparison to chewBacca that contains all alleles.
wgMLST create Salmonella_enterica.pymlstdb genes.fasta

Lee Katz · Answer 6 · Wed Jun 15 2022 20:06:00 GMT+0800 (China Standard Time)

Yes I'm able to get to that site with lynx https://www.cgmlst.org/ncs. I think that sometimes our firewall is funny though and we cannot access ftp sites.

Can I create a local database if I have the full fasta files with all alleles?

bvalot · Answer 7 · Wed Jun 15 2022 20:12:53 GMT+0800 (China Standard Time)

No you need only one fasta file with only one allele by gene. It's was quiet easy to python script that from you chewbacca files

Lee Katz · Answer 8 · Fri Jun 17 2022 22:57:01 GMT+0800 (China Standard Time)

Ok I think I understand that. But if I import only one allele per locus, then how do I call other alleles with the new database? Wouldn't I need other alleles in the database?

bvalot · Answer 9 · Sat Jun 18 2022 00:08:57 GMT+0800 (China Standard Time)

No, you don't need because the database would be automatically extends with the alleles found in your strains

Lee Katz · Answer 10 · Sat Jun 18 2022 08:27:43 GMT+0800 (China Standard Time)

Got it, thanks! I'll try this out next chance I get and so I'll close out this ticket for now.

Lee Katz · Answer 11 · Thu Jun 30 2022 09:43:27 GMT+0800 (China Standard Time)

Thanks! It works now! I was able to query genomes with

(set -e; 
  for i in illumina/Salm/validation-dataset/shovill.out/*_1.shovillSpades.fasta;  do 
  b=$(basename $i _1.shovillSpades.fasta); 
  wgMLST add --strain $b MLST.db/Salmonella_enterica.pyMLST/Salmonella_enterica.pymlstdb $i; 
done;)

Lee Katz · Answer 12 · Thu Jun 30 2022 09:44:43 GMT+0800 (China Standard Time)

In some instances I added a genome twice which broke my loop. So it might be useful to have a function to check whether a strain name has already been added to the database.

bvalot · Answer 13 · Fri Jul 01 2022 14:56:28 GMT+0800 (China Standard Time)

There is one normally, that prompt you that you have already a strain in the database.

You can also remove strains witth "remove" command

Lee Katz · Answer 14 · Fri Jul 01 2022 19:26:45 GMT+0800 (China Standard Time)

Okay cool, thanks!