[CRITICAL] WORKER TIMEOUT

Question

[CRITICAL] WORKER TIMEOUT

moelliDo opened this issue 4 years ago · comments

I'm running the uvicorn-gunicorn-fastapi:python3.7 Docker-Image on an Azure App Service (B2: 200 ACU, 2 Cores, 3.5 GB Memory, OS: Linux).

My Dockerfile looks as follows:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

WORKDIR /app

RUN apt-get update \
    && apt install -y tesseract-ocr tesseract-ocr-deu libgl1-mesa-dev poppler-utils \
    && apt clean


COPY /app .

RUN pip install -r /app/requirements.txt

The service accepts POST requests with a file attached and processes it using tesseract and open-cv.
After the file has been processed, the service responds with the result of the processed file.

Oftentimes, however, the processing stops with the following error:

2020-11-04T13:48:58.000206215Z [2020-11-04 13:48:57 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)

2020-11-04T13:48:58.529238062Z [2020-11-04 13:48:58 +0000] [90] [INFO] Booting worker with pid: 90
2020-11-04T13:49:00.743342241Z [2020-11-04 13:49:00 +0000] [90] [INFO] Started server process [90]
2020-11-04T13:49:00.743447942Z [2020-11-04 13:49:00 +0000] [90] [INFO] Waiting for application startup.
2020-11-04T13:49:00.748887110Z [2020-11-04 13:49:00 +0000] [90] [INFO] Application startup complete.

This error does not occur after the default timeout of 120 seconds. Still, I tried to get rid of the error by using a custom gunicorn_conf.py and increased the timeout to 180 seconds. Additionally, I tried to solve the issue by increasing/decreasing the amount of workers per core. The error still remains.
I also checked the log-files on the App Service but there isn't any further information about the error.
Changing the LOG_LEVEL within the gunicorn_conf-file didn't help, either.

Does anyone know a solution for the problem? Running the Docker-Container locally works just fine (Windows 10, Docker Engine v19.03.13)

Mateus José · Answer 1 · Tue Feb 02 2021 02:29:38 GMT+0800 (China Standard Time)

What's up @moelliDo, did you find some answer for the problem? I'm experiencing the same issue and i'm not able to find the cause. When running locally it works just fine, but in production (EC2 with docker) the problem just happens.
So my workaround for now is to use a background task to execute my endpoint request, then in another moment show the results for the request (working together with the front-end to make some online logs).
But i really would like to know what is the probem and why is it happening.

moelliDo · Answer 2 · Wed Feb 03 2021 22:42:47 GMT+0800 (China Standard Time)

Hi @mateusjs. App Services in Azure time out after 230 seconds by default and - AFAIK - this timeout can't be configured. So we managed to improve the services logic such that the timeout wasn't a problem anymore.

Guillermo Etchebarne · Answer 3 · Fri Apr 30 2021 02:31:45 GMT+0800 (China Standard Time)

Hello @moelliDo. I'm currently having a very simmilar problem. In my case, the workers are timing out after only two seconds of getting the request. How much time passed before your workers timed out?

[2021-04-29 17:57:10 +0000] [232] [DEBUG] ('172.17.0.1', 43406) - Connected
[2021-04-29 17:57:21 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:232). 
[2021-04-29 17:57:21 +0000] [946] [INFO] Booting worker with pid: 946. 
[2021-04-29 17:57:22 +0000] [946] [INFO] Started server process [946]. 
[2021-04-29 17:57:22 +0000] [946] [INFO] Waiting for application startup.

moelliDo · Answer 4 · Wed May 05 2021 14:17:02 GMT+0800 (China Standard Time)

Hi @guillermoetchebarne my workers timed out after about 120-180 seconds after the request was received by the service. Two seconds is really quick. How did you deploy your service? Did you adjust the config somehow?

Kevin Morris · Answer 5 · Sun Feb 20 2022 04:27:47 GMT+0800 (China Standard Time)

Indeed this is a real issue, originally brought up in #46. The solution provided there could help in some cases, however, in the cases we've seen, we haven't even come close to reaching the default 120 second graceful timeout period.