aws / aws-mwaa-local-runner

This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lxml import etree ImportError: libxslt.so.1: cannot open shared object file: No such file or directory

psmyth2 opened this issue · comments

I am attempting to import Salesforce from simple-salesforce in my dag file. I have specified simple_salesforce==1.12.5 in my requirements.txt file and it installs and runs fine on my mwaa-local-runner docker instance. However, when I used the same requirements file in my production MWAA environment the following issue occurs:

  • simple-salesforce and other packages in requirements.txt install without issue (Cloudwatch logs confirm this)
  • I add my DAG that imports simple-salesforce to S3 dags folder
  • the DAG fails to run due to following import error:
    Broken DAG: [/usr/local/airflow/dags/example_dag_with_taskflow_api.py] Traceback (most recent call last): File "/usr/local/airflow/.local/lib/python3.11/site-packages/zeep/transports.py", line 11, in <module> from zeep.utils import get_media_type, get_version File "/usr/local/airflow/.local/lib/python3.11/site-packages/zeep/utils.py", line 5, in <module> from lxml import etree ImportError: libxslt.so.1: cannot open shared object file: No such file or directory

I also attempted the same workflow using apache-airflow-providers-salesforce. Again, this works fine using my aws-mwaa-local-runner but fails when using the same requirements.txt and dag in AWS production MWAA.

My requirement.txt is pretty simple:
using the airflow provider
`--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.2/constraints-3.11.txt"

apache-airflow-providers-snowflake==5.0.1
apache-airflow-providers-mysql==5.3.1
apache-airflow-providers-salesforce==5.4.3`

requirements.txt using just simple-salesforce pip install
`--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.2/constraints-3.11.txt"

apache-airflow-providers-snowflake==5.0.1
apache-airflow-providers-mysql==5.3.1

simple_salesforce==1.12.5`

Any help/insignts would be much appreciated.

I have the same exact issue and can further add that this problem started happening immediately after upgrading from MWAA 2.6.3 to 2.7.2. Our Salesforce DAG is now showing this same exact error.

@psmyth2 Did you find any solution or workaround ?

@johnwrf unfortunately I didn't find any solutions. My workaround has been to migrate this particular Salesforce etl workflow to github actions for now. My hunch is it has something to do with docker container changes at this version, but didn't have the time to troubleshoot.

See this thread:
https://apache-airflow.slack.com/archives/CCRR5EBA7/p1704789712460549?thread_ts=1701338115.944909&cid=CCRR5EBA7

For the “ImportError: libxslt.so.1: cannot open shared object file: No such file or directory” issue on MWAA -
You should add this to the startup-script:

pip uninstall -y lxml
sudo apt install python3-lxm

@psmyth2 @johnwrf

I was running into the same issue on MWAA 2.7.2 as well.

image

I looked into the bootstrap script and see that they add the libxml libraries here:

dnf install -y libxml2-devel libxslt-devel

I added similar lines to my startup script and the import error has gone away and my DAG loads.

#!/bin/sh

set -ex

# Install XML libraries for simple-salesforce to avoid
# `ImportError: libxslt.so.1: cannot open shared object file: No such file or directory`
sudo yum -y install libxml2-devel libxslt-devel
image

Hope this helps. 👍