Add a batch mode to the dabase import ?
pimarin opened this issue · comments
Hello !
I'm trying to use pyMLST tool, which seem exactly what I need, but I don't understand how to make 1 database with some species to a specific analysis (from public database as cgmlst or pubmlst).
I wondering if it's possible to download a list of all the available species schemes to a classical MLST or a cgMLST ?
I'm working on a project with hundreds of different species, and we start with available database scheme, then we update with lacal shemes, and you tool allow this option by updating your own database.
Thank you.
Hello,
I don't really understand what you need. All species cover by import are available on dedicated web site:
- MLST : https://pubmlst.org/organisms/
- cgMLST : https://cgmlst.org/ncs
You can load this database with import command
Hi @bvalot !
I want to make an analysis like in classical mlst.
I have some assemblies from different species and I want to use your tool with a databse build from cgmlst.org.
But if a try to build a database without adding a species name, your tool ask me tonchoose fromba list.
Yes, it is 2 steps analysis as example for Pseudomonas aeruginosa:
- First you import the MLST database for your species
claMLST import my_database.db Pseudomonas aeruginosa
- Then you search you genome assembly with this database
claMLST search my_database.db strain1.fasta strain2.fasta ...
Maybe as an extension to this thread - I am building a pipeline for bacterial isolate analysis. The pipeline automatically provisions software via the usual channels; and I do plan to have a full "unattended" installation routine for the various dependencies, including the pyMLST MLST databases.
I have gotten fairly far with mining the pubmlst REST API to get a list of all available schemas and build a list of commands to use with claMLST import. But it still doesn't run fully unattended since it is largely unable to deal with the rare ambiguity.
Say, if I do:
claMLST import -m mlst Mbovis mycoplasma bovis
Even with me specifying the -m option, it still stops because it seems to use string matching to find the correct schema - and in this case there are two schemas with "mlst" in the name - 'mlst' and 'mlst (legacy)'. I haven't found a way yet to avoid this from happening. Truth be told, it would be much easier if there was a pre-built list of indices somewhere I could just download instead of going through a semi-interactive download procedure as currently implemented. Something like
claMLST import all
Worst case would be that I have to rebuild your import function with a little bit more logic to choose available schemas (or just download and build all of them....) and then build the schemas using claMLST create
.
Cheers
Marc
Hi Marc,
In fact claMLST import
fonction are more oriented to interactive import of specific shema using the current API of pubmlst.
You can see all element in the API by omit species, but all species return don't have an mlst shema
claMLST import /path/to/database
If you want to prebuilt all possible mlst shema, I thnink the better way is to use the claMLST create
fonction with allele and profiles download. All are listed here:
https://pubmlst.org/data/
I just push a new release (2.1.6) that bypath some error when using claMLST create
For you problem of mlst shema ambiguity, you can be more precise in the -m option using quotes.
claMLST import -m "mlst (legacy)" /tmp/Mbovis.db mycoplasma bovis
But in fact in this example, you can not import automatically the base mlst