jobs stuck in PENDING status with local mmseqs-web API
reyjul opened this issue · comments
Hello,
I'm trying to make the mmseqs-web API work but I'm encountering several issues.
This is the Dockerfile I used to build the API:
FROM --platform=linux/amd64 golang:latest as builder
ARG TARGETARCH
WORKDIR /opt/build
ADD backend .
RUN GOOS=linux GOARCH=$TARGETARCH go build -o mmseqs-web
ADD https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz .
ADD https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz .
ADD https://raw.githubusercontent.com/soedinglab/MMseqs2/678c82ac44f1178bf9a3d49bfab9d7eed3f17fbc/util/mmseqs_wrapper.sh binaries/mmseqs
ADD https://raw.githubusercontent.com/steineggerlab/foldseek/0a68e16214a6db745cee783128ccba8546ea5dc9/util/foldseek_wrapper.sh binaries/foldseek
RUN mkdir binaries; \
if [ "$TARGETARCH" = "arm64" ]; then \
for i in mmseqs foldseek; do \
if [ -e "${i}-linux-arm64.tar.gz" ]; then \
cat ${i}-linux-arm64.tar.gz | tar -xzvf- ${i}/bin/${i}; \
mv ${i}/bin/${i} binaries/${i}; \
fi; \
done; \
else \
for i in mmseqs foldseek; do \
for j in sse2 sse41 avx2; do \
if [ -e "${i}-linux-${j}.tar.gz" ]; then \
cat ${i}-linux-${j}.tar.gz | tar -xzvf- ${i}/bin/${i}; \
mv ${i}/bin/${i} binaries/${i}_${j}; \
fi; \
done; \
done; \
fi;
RUN chmod -R +x binaries
FROM debian:stable-slim
LABEL maintainer="Milot Mirdita <milot@mirdita.de>"
RUN apt-get update && apt-get install -y ca-certificates wget aria2 && rm -rf /var/lib/apt/lists/*
COPY --from=builder /opt/build/mmseqs-web /opt/build/binaries/* /usr/local/bin/
ENTRYPOINT ["/usr/local/bin/mmseqs-web"]
I then installed the databanks and created the indexes the usual way:
mmseqs databases UniRef50 UniRef50 tmp --remove-tmp-files
mmseqs createindex UniRef50 tmp --split 1
and added the params files along the banks in the same directory (/local/banks):
{
"name": "UniRef50",
"path": "UniRef50",
"version": "",
"default": true,
"order": 0,
"index": "",
"search": "",
"status": "COMPLETE"
}
This is how I launch the API:
singularity exec --env MMSEQS_NUM_THREADS=2 --bind /local/banks:/local/banks /shared/software/singularity/images/mmseqs2-app-v7-8e1704f-rpbs.sif /usr/local/bin/mmseqs-web -local -config config.json -app mmseqs
This is the content of the config.json file:
{
"app": "mmseqs",
"verbose": true,
"server" : {
"address" : "0.0.0.0:3000",
"dbmanagment": false,
"cors" : true
},
"worker": {
"gracefulexit" : true
},
"paths" : {
"databases" : "/local/banks/",
"results" : "/shared/home/rey/colabfold",
"temporary" : "/tmp",
"colabfold" : {
"uniref" : "/local/banks/UniRef50"
},
"mmseqs" : "/usr/local/bin/mmseqs",
"foldseek" : "/usr/local/bin/foldseek"
},
"redis" : {
"network" : "tcp",
"address" : "mmseqs-web-redis:6379",
"password" : "",
"index" : 0
},
"mail" : {
"type" : "null",
"sender" : "mail@example.org",
"templates" : {
"success" : {
"subject" : "Done -- %s",
"body" : "Dear User,\nThe results of your submitted job are available now at https://search.mmseqs.com/queue/%s .\n"
},
"timeout" : {
"subject" : "Timeout -- %s",
"body" : "Dear User,\nYour submitted job timed out. More details are available at https://search.mmseqs.com/queue/%s .\nPlease adjust the job and submit it again.\n"
},
"error" : {
"subject" : "Error -- %s",
"body" : "Dear User,\nYour submitted job failed. More details are available at https://search.mmseqs.com/queue/%s .\nPlease submit your job later again.\n"
}
}
}
}
I get a response with curl which seems to indicate that the API is running and listening on correct port (3000):
curl -X GET http://10.0.1.246:3000/databases
{"databases":[{"name":"UniRef50","version":"","path":"UniRef50","default":true,"order":0,"taxonomy":false,"full_header":false,"index":"","search":"","status":"COMPLETE"},{"name":"UniRef30","version":"2103","path":"UniRef30","default":false,"order":1,"taxonomy":false,"full_header":false,"index":"","search":"","status":"COMPLETE"}]}
On a side note, I can't list databases if I the status in the params file is different from COMPLETE.
If I try to submit a sequence with python:
>>> from requests import get, post
>>> ticket = post('http://10.0.1.246:3000/ticket', {
... 'q' : '>FASTA\nMPKIIEAIYENGVFKPLQKVDLKEGE\n',
... 'database[]' : ["UniRef50"],
... 'mode' : 'all',
... }).json()
>>> ticket
{'id': 'A5n_NyrysSRtH7tNN6uuYdS6LFkv2bhK3Z94IA', 'status': 'PENDING'}
The directory containing the job is correctly created. But then nothing happens, the jobs stays forever in PENDING state.
Trying to get job status after a few hours, nothing seems to happen either:
>>> status = get('http://10.0.1.246:3000/ticket/' + ticket['id']).json()
>>> status
{'id': 'A5n_NyrysSRtH7tNN6uuYdS6LFkv2bhK3Z94IA', 'status': 'PENDING'}
Any idea / advice are welcome.
Please try adding -local.workers 1
to the command line call.
There should be a local
settings in the config file:
"local" : {
"workers" : 1
}
I guess when that part is missing it will get initialized to 0 by default and not start any worker?
If this was not the issue, can you post the output that the singularity process generated?
Great, that works.
I also had to modify this line in the Dockerfile to avoid "Permission denied" issues:
RUN chmod -R +rx binaries
And the .params files have to be writeable by all or I would get this message from the API:
Execution Error: open /local/banks/UniRef50.params: permission denied
Thanks a lot.