Question: Can I use cuda 11, for gpu inference on mms

Question

Question: Can I use cuda 11, for gpu inference on mms

kaushal-idx opened this issue 2 years ago · comments

I tried building mms docker image on top of nvidia/cuda:11.6.0-runtime-ubuntu20.04
but this failed.
Is this possible ?

Aaqib · Answer 1 · Fri Apr 22 2022 01:19:47 GMT+0800 (China Standard Time)

It should be possible to run it with cuda 11. Can you post the error that you ran into?

kaushal · Answer 2 · Fri Apr 22 2022 11:06:06 GMT+0800 (China Standard Time)

hi @maaquib , even i thought so, but i am getting this error,

Aaqib · Answer 3 · Sat Apr 23 2022 01:48:49 GMT+0800 (China Standard Time)

@kaushal-idx This seems like an issue with cuda version incompatibility with your container, not an MMS issue. I'd suggest checking the driver version on your host machine and using an appropriate base container

kaushal · Answer 4 · Mon Apr 25 2022 15:21:51 GMT+0800 (China Standard Time)

Hi @maaquib your suggestion fixed the issue thank you, but I am afraid I am getting another one

my driver is 470.103.01 and cuda version on host machine is 11.4

I tried replicating the same in mms-gpu-docker

FROM nvidia/cuda:11.4.0-cudnn8-runtime-ubuntu20.04

ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    fakeroot \
    ca-certificates \
    dpkg-dev \
    g++ \
    python3-dev \
    openjdk-8-jdk-headless \
    curl \
    vim \
    && rm -rf /var/lib/apt/lists/* \
    && cd /tmp \
    && curl -O https://bootstrap.pypa.io/pip/3.6/get-pip.py \
    && python3 get-pip.py


RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    ffmpeg libsm6 libxext6

RUN pip install  multi-model-server \
    && pip install  mxnet-cu92mkl==1.4.0
    
RUN useradd -m model-server \
    && mkdir -p /home/model-server/tmp
COPY --chown=model-server dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
COPY --chown=model-server config.properties /home/model-server
COPY --chown=model-server extract_snapshot_details.py /home/model-server
COPY --chown=model-server get_snapshot.py /home/model-server
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
    && chown -R model-server /home/model-server
EXPOSE 8080 8081
USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENV AWS_PROFILE=textract
COPY --chown=model-server requirements.txt .
ENV PATH="/home/model-server/.local/bin:${PATH}"
RUN pip install -r requirements.txt
RUN mkdir -p /home/model-server/model-store/logs
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
CMD ["serve"]

Not sure what am i doing wrong, but i am getting the following error

Can you please help me, solve this