acryldata / datahub-actions

DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ingestion Executor - Failed to configure the source (mssql): No module named 'pyodbc'

waqassiddiqi opened this issue · comments

I am trying to configure MSSQL server source via UI but it fails with Failed to configure the source (mssql): No module named 'pyodbc' suggesting that the required pip dependency is missing - I even tried extending acryldata/datahub-actions:v0.0.11 docker image but it didn't work either - i believe because it creates venv based on predefined requirements.

Is there anyway to specify additional dependencies that needs to installed (pyodbc in this instance) to ingest data from UI, any help / direction is highly appreciated?

For those looking for a solution, found a solution thanks to community member on slack;

  1. Modify /usr/local/bin/ingestion_common.sh file by adding --system-site-packages flag when venv is being created i.e on line 36: python3 -m venv --system-site-packages $venv_dir
  2. Use acryldata/datahub-actions base image to create an image with pyodbc and other required dependencies installed

The Dockerfile i used:

FROM acryldata/datahub-actions:v0.0.11

USER root

RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev tdsodbc
RUN pip install pyodbc

COPY ingestion_common.sh /usr/local/bin

USER datahub

@waqassiddiqi Why this issue was closed? It wasn't fixed at the project level.
I faced with this issue and some other guys too, I think.
Is it possible to reopen the issue?

@gesundes if you set your source type to mssql-odbc, does that work?