damavis / airflow-pentaho-plugin

Pentaho plugin for Apache Airflow - Orquestate pentaho transformations and jobs from Airflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Job status is always Success

pkashibh opened this issue · comments

Hi,
I've just started using this plugin. I have created a transformation on the Pentaho data integration server and have created a connection from Airflow to PDI. I'm using PanOperator & KitchenOperator to trigger Pentaho transformation & jobs respectively. There is a dependency created in the DAG like this: Transformation >> JOB. Even when the transformation fails the status of the Transformation is always green on the job graph and JOB is also getting triggered. I was expecting that failure should get reported and not run the subsequent downstream. Any suggestions on what am I missing or doing incorrect? My DAG is given below:

from datetime import timedelta
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow_pentaho.operators.KitchenOperator import KitchenOperator
from airflow_pentaho.operators.PanOperator import PanOperator
from airflow_pentaho.operators.CarteJobOperator import CarteJobOperator
from airflow_pentaho.operators.CarteTransOperator import CarteTransOperator

_DAG_NAME = "pdi_example_2"
DEFAULT_ARGS = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'email': ['abc@abc.com'],
'retries': 3,
'retry_delay': timedelta(minutes=10),
'email_on_failure': False,
'email_on_retry': False
}

with DAG(dag_id=DAG_NAME,
default_args=DEFAULT_ARGS,
dagrun_timeout=timedelta(hours=2),
schedule_interval='30 0 * * *') as dag:

trans = PanOperator(
        queue="pdi_2",
        task_id="pdi_example_2",
        directory={},
        file="/path/sample.ktr",
        trans={},
        params={},
        dag=dag)
        
job = KitchenOperator( 
            queue="pdi_3",
            task_id="average_spent",
            directory={},
            job={},
            file="/path/sample.kjb",
            params={},  # Date in yyyy-mm-dd format
            dag=dag)_

trans >> job

Hi @pkashibh, could you share the log of PanOperator? Thx.

Hi @piffall, I'm adding a screenshot of my PanOperator log. I see the exit code from Pan is ALWAYS 0 (Success). I've also checked this online and I see the fix mentioned here (https://jira.pentaho.com/browse/PDI-13952) is already part of my spoon.sh.
I use PDI Pan - Kettle version 6.0.1.0-386 on RHEL 6.9.

image

Thanks!

Hi @piffall ,

Adding on to my previous comment, I've now modified spoon.sh & pan.sh files on PDI server to return non 0 codes upon failure. This is also printed in the log (highlighted) for reference. However, the kettle.py is still reading the status code=0 even though pan.sh returns 1. Not sure where is the disconnect.

image

Hi @piffall ,

I think I figured out the problem. Here are the takeaways from this:

  • Ensure spoon.sh file on the PDI server checks for the status of the complete execution and not just the last command executed. Reference: https://jira.pentaho.com/browse/PDI-14658
  • Never have the echo statements in the last line of pan.sh/kitchen.sh on PDI server. kettle.py file of the plugin doesn't understand if the pan.sh has failed and the status will be set to 0. In my case I had added echo "Pan Exit Code: " $? line in the pan.sh file to debug the status in spoon > pan > kettle. Hence kettle was treating the status=0 though pan had failed. I removed this line from pan.sh and it started to work now (image below)

image

It has already failed and marked for retry as expected:
image

Thanks!

Ok, @pkashibh . As the issue seems to reside on an old version of PDI, I close it. Thanks.