awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: Can I use cuda 11, for gpu inference on mms

kaushal-idx opened this issue · comments

I tried building mms docker image on top of nvidia/cuda:11.6.0-runtime-ubuntu20.04
but this failed.
Is this possible ?

commented

It should be possible to run it with cuda 11. Can you post the error that you ran into?

hi @maaquib , even i thought so, but i am getting this error,
image

commented

@kaushal-idx This seems like an issue with cuda version incompatibility with your container, not an MMS issue. I'd suggest checking the driver version on your host machine and using an appropriate base container

Hi @maaquib your suggestion fixed the issue thank you, but I am afraid I am getting another one

image

my driver is 470.103.01 and cuda version on host machine is 11.4

image

I tried replicating the same in mms-gpu-docker

FROM nvidia/cuda:11.4.0-cudnn8-runtime-ubuntu20.04

ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    fakeroot \
    ca-certificates \
    dpkg-dev \
    g++ \
    python3-dev \
    openjdk-8-jdk-headless \
    curl \
    vim \
    && rm -rf /var/lib/apt/lists/* \
    && cd /tmp \
    && curl -O https://bootstrap.pypa.io/pip/3.6/get-pip.py \
    && python3 get-pip.py


RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    ffmpeg libsm6 libxext6

RUN pip install  multi-model-server \
    && pip install  mxnet-cu92mkl==1.4.0
    
RUN useradd -m model-server \
    && mkdir -p /home/model-server/tmp
COPY --chown=model-server dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
COPY --chown=model-server config.properties /home/model-server
COPY --chown=model-server extract_snapshot_details.py /home/model-server
COPY --chown=model-server get_snapshot.py /home/model-server
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
    && chown -R model-server /home/model-server
EXPOSE 8080 8081
USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENV AWS_PROFILE=textract
COPY --chown=model-server requirements.txt .
ENV PATH="/home/model-server/.local/bin:${PATH}"
RUN pip install -r requirements.txt
RUN mkdir -p /home/model-server/model-store/logs
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
CMD ["serve"]

Not sure what am i doing wrong, but i am getting the following error

image

Can you please help me, solve this