damavis / airflow-pentaho-plugin

Pentaho plugin for Apache Airflow - Orquestate pentaho transformations and jobs from Airflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

issues with "params"

erwanj opened this issue · comments

we have just upgraded from airflow v1.10.11 to airflow v2.2.4.
We are getting the following error in airflow. This dag was working correctly before the migration.

Broken DAG: [/opt/airflow/dags/dag1.py] Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 578, in serialize_operator
    serialize_op['params'] = cls._serialize_params_dict(op.params)
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 451, in _serialize_params_dict
    if f'{v.__module__}.{v.__class__.__name__}' == 'airflow.models.param.Param':
AttributeError: 'str' object has no attribute '__module__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 939, in to_dict
    json_dict = {"__version": cls.SERIALIZER_VERSION, "dag": cls.serialize_dag(var)}
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 851, in serialize_dag
    raise SerializationError(f'Failed to serialize DAG {dag.dag_id!r}: {e}')
airflow.exceptions.SerializationError: Failed to serialize DAG 'dag1': 'str' object has no attribute '__module__'

The dag looks like :

# DAG creation
with DAG (
    dag_id="dag1",
    default_args=DEFAULT_ARGS,
    max_active_runs=1,
    description='dag1',
    schedule_interval=WORKFLOW_SCHEDULE_INTERVAL,
    catchup=False
) as dag :

    extract_tables = KitchenOperator (
        pdi_conn_id='pdi_default',
        task_id="extract_tables",
        directory=PROJECT_DIRECTORY,
        job="jb_01_load_ods_tables",
        file=PROJECT_DIRECTORY+"jb_01_load_ods_tables.kjb",
        params={
            "date": '{{ ds }}'
        }
    )

If we remove "date": '{{ ds }}', the error in airflow disapears.

Is it possible that the issue is linked with airflow-pentaho-plugin?

Hello @erwanj

I don't think so. It seems that this is an Airflow issue on DAG serialization. I found a similar issue on Airflow repository:
apache/airflow#20875

I will investigate more on this.

Please, take a look to DAG serialization,
disabling it, may be a temporal solution for you.

Please, try to run:

airflow dags reserialize

Thank you very much for your feedback.
I have tried with pdi_flow.py from sample_dags folder. I have removed job2 and trans2 as I don't have a carte server.
I have the same behaviour.

Broken DAG: [/opt/airflow/dags/pdi_flow.py] Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 578, in serialize_operator
    serialize_op['params'] = cls._serialize_params_dict(op.params)
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 451, in _serialize_params_dict
    if f'{v.__module__}.{v.__class__.__name__}' == 'airflow.models.param.Param':
AttributeError: 'str' object has no attribute '__module__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 939, in to_dict
    json_dict = {"__version": cls.SERIALIZER_VERSION, "dag": cls.serialize_dag(var)}
  File "/usr/local/lib/python3.8/dist-packages/airflow/serialization/serialized_objects.py", line 851, in serialize_dag
    raise SerializationError(f'Failed to serialize DAG {dag.dag_id!r}: {e}')
airflow.exceptions.SerializationError: Failed to serialize DAG 'pdi_flow': 'str' object has no attribute '__module__'

If I remove 'date': '{{ ds }}' from params, I don't get the error.

If you have the possibility to do it, can you confirm me that this sample dag pdi_flow.py does work on airflow v2.2.4.

Thank you.

it seems that it works now.
I had to :
from airflow.models.param import Param
and use the following syntax for params :

        params={
           'date': Param('{{ ds }}')
        }

Thank you for your help

It seems that params should not be used, because it's already beeing used by BaseOperator.
I found a conversation about this.

This argument property, params, will be renamed to task_params.

Hi @erwanj , I applied a patch and also I released v1.0.9. Please, update the plugin. Thank you for reporting this.

Thank you very much for your help.