Add a batch mode to the dabase import ?

Question

Add a batch mode to the dabase import ?

pimarin opened this issue a year ago · comments

Hello !

I'm trying to use pyMLST tool, which seem exactly what I need, but I don't understand how to make 1 database with some species to a specific analysis (from public database as cgmlst or pubmlst).
I wondering if it's possible to download a list of all the available species schemes to a classical MLST or a cgMLST ?
I'm working on a project with hundreds of different species, and we start with available database scheme, then we update with lacal shemes, and you tool allow this option by updating your own database.

Thank you.

bvalot · Answer 1 · Mon Dec 04 2023 22:02:57 GMT+0800 (China Standard Time)

Hello,

I don't really understand what you need. All species cover by import are available on dedicated web site:

MLST : https://pubmlst.org/organisms/
cgMLST : https://cgmlst.org/ncs

You can load this database with import command

pimarin · Answer 2 · Tue Dec 19 2023 14:44:43 GMT+0800 (China Standard Time)

Hi @bvalot !

I want to make an analysis like in classical mlst.
I have some assemblies from different species and I want to use your tool with a databse build from cgmlst.org.
But if a try to build a database without adding a species name, your tool ask me tonchoose fromba list.

bvalot · Answer 3 · Tue Dec 19 2023 15:58:14 GMT+0800 (China Standard Time)

Yes, it is 2 steps analysis as example for Pseudomonas aeruginosa:

First you import the MLST database for your species
claMLST import my_database.db Pseudomonas aeruginosa
Then you search you genome assembly with this database
claMLST search my_database.db strain1.fasta strain2.fasta ...

marchoeppner · Answer 4 · Thu Mar 21 2024 14:06:51 GMT+0800 (China Standard Time)

Maybe as an extension to this thread - I am building a pipeline for bacterial isolate analysis. The pipeline automatically provisions software via the usual channels; and I do plan to have a full "unattended" installation routine for the various dependencies, including the pyMLST MLST databases.

I have gotten fairly far with mining the pubmlst REST API to get a list of all available schemas and build a list of commands to use with claMLST import. But it still doesn't run fully unattended since it is largely unable to deal with the rare ambiguity.

Say, if I do:

claMLST import -m mlst Mbovis mycoplasma bovis

Even with me specifying the -m option, it still stops because it seems to use string matching to find the correct schema - and in this case there are two schemas with "mlst" in the name - 'mlst' and 'mlst (legacy)'. I haven't found a way yet to avoid this from happening. Truth be told, it would be much easier if there was a pre-built list of indices somewhere I could just download instead of going through a semi-interactive download procedure as currently implemented. Something like

claMLST import all

Worst case would be that I have to rebuild your import function with a little bit more logic to choose available schemas (or just download and build all of them....) and then build the schemas using claMLST create.

Cheers
Marc

bvalot · Answer 5 · Thu Mar 21 2024 16:04:02 GMT+0800 (China Standard Time)

Hi Marc,

In fact claMLST import fonction are more oriented to interactive import of specific shema using the current API of pubmlst.
You can see all element in the API by omit species, but all species return don't have an mlst shema

claMLST import /path/to/database

If you want to prebuilt all possible mlst shema, I thnink the better way is to use the claMLST create fonction with allele and profiles download. All are listed here:
https://pubmlst.org/data/

I just push a new release (2.1.6) that bypath some error when using claMLST create

bvalot · Answer 6 · Thu Mar 21 2024 16:31:57 GMT+0800 (China Standard Time)

For you problem of mlst shema ambiguity, you can be more precise in the -m option using quotes.

claMLST import -m "mlst (legacy)" /tmp/Mbovis.db mycoplasma bovis

But in fact in this example, you can not import automatically the base mlst