damavis / airflow-pentaho-plugin

Pentaho plugin for Apache Airflow - Orquestate pentaho transformations and jobs from Airflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CarteTransOperator.params does not work with Jinja templates

evgdol opened this issue · comments

commented

The goal is to pass airflow variables and/or xcom value into Pentaho transformation.
For example; '{{ ds }}' and 'ti.xcom_pull("previous_task_id")'
My DAG task:
trans = CarteTransOperator( dag=dag, task_id='test_trans_params', pdi_conn_id='pentaho_etl', subpath='/pentaho', trans='/public/test/tr_tst_input_params', params={'PARAM': '{{ ds }}'})
The transformation is simple: write the input parameter to log
image

The result:
image

Expected behaviour: input_param = 2023-03-03

If I put some constant or Python function instead of Jinja template, then it works fine.
For example:
params={'PARAM': str(datetime.today())}
Returns:
image

Can you help with this issue?
Airflow version: 2.5.1

Hi @evgdol

I've just opened a branch to fix this issue. Can you test by installing this version?

pip install git+https://github.com/damavis/airflow-pentaho-plugin@hotfix/fix-params-on-carte-hook

Please, let me know if that fix the issue.

Thanks.

commented

Hi @piffall

I have tried the new version. Nothing changed.

Ok. I've released a new version that may fix it. You can install it running this command:

pip install airflow-pentaho-plugin==1.0.17
commented

Dear @piffall,

I cannot found what have changed since previous message.
But I have reviewed the code, and I think that the issue is that you fixed the function "run_job" only. In my example, I am launching the transformation. So, I believe that the same changes should be applied in the "run_trans" function.

Hi @evgdol ,

I've applied the same patch to the transformation. Please, check it now after upgrading.

pip install airflow-pentaho-plugin==1.0.18
commented

Unfortunately, this is also was not helpful

Hi @evgdol ,

I've just tested it and it's working properly:

job = CarteJobOperator(
        dag=dag,                                                                
        task_id='job3',                                                         
        job='/home/bi/test',                                                    
        params={'date': '{{ ds }}'})
[2023-03-03, 14:00:06 UTC] {carte.py:117} INFO - Finished: /home/bi/test, with id 853ff480-93ab-4cfc-899d-968509fba68e
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Start of job execution
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Starting entry [Success]
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Finished job entry [Success] (result=[true])
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Starting entry [Write to log]
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - Value for date - 2023-03-03
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Starting entry [Success]
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Finished job entry [Success] (result=[true])
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Finished job entry [Write to log] (result=[true])
[2023-03-03, 14:00:06 UTC] {carte.py:54} INFO - 2023/03/03 14:00:01 - test - Job execution finished

I'm trying the same for transformation, but I cannot log the value there.

commented

Dear @piffall,

Maybe some additional settings are requires?
Not sure if it depends on Pentaho version (I am using 8.0).

Hi @evgdol . I don't think so.

I've checked with transformation, as in your example, and it's working.

[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - test_trans - Dispatching started for transformation [test_trans]
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Get variables.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - 
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - ------------> Linenr 1------------------------------
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - 2023-03-01
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - 
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - date = 2023-03-01
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - 
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - ====================
[2023-03-03, 14:43:24 UTC] {carte.py:54} INFO - 2023/03/03 14:43:24 - Write to log.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)

Please, make sure that the package is upgraded in all your airflow machines.