Job status is always Success
pkashibh opened this issue · comments
Hi,
I've just started using this plugin. I have created a transformation on the Pentaho data integration server and have created a connection from Airflow to PDI. I'm using PanOperator & KitchenOperator to trigger Pentaho transformation & jobs respectively. There is a dependency created in the DAG like this: Transformation >> JOB. Even when the transformation fails the status of the Transformation is always green on the job graph and JOB is also getting triggered. I was expecting that failure should get reported and not run the subsequent downstream. Any suggestions on what am I missing or doing incorrect? My DAG is given below:
from datetime import timedelta
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow_pentaho.operators.KitchenOperator import KitchenOperator
from airflow_pentaho.operators.PanOperator import PanOperator
from airflow_pentaho.operators.CarteJobOperator import CarteJobOperator
from airflow_pentaho.operators.CarteTransOperator import CarteTransOperator
_DAG_NAME = "pdi_example_2"
DEFAULT_ARGS = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'email': ['abc@abc.com'],
'retries': 3,
'retry_delay': timedelta(minutes=10),
'email_on_failure': False,
'email_on_retry': False
}
with DAG(dag_id=DAG_NAME,
default_args=DEFAULT_ARGS,
dagrun_timeout=timedelta(hours=2),
schedule_interval='30 0 * * *') as dag:
trans = PanOperator(
queue="pdi_2",
task_id="pdi_example_2",
directory={},
file="/path/sample.ktr",
trans={},
params={},
dag=dag)
job = KitchenOperator(
queue="pdi_3",
task_id="average_spent",
directory={},
job={},
file="/path/sample.kjb",
params={}, # Date in yyyy-mm-dd format
dag=dag)_
trans >> job
Hi @pkashibh, could you share the log of PanOperator? Thx.
Hi @piffall, I'm adding a screenshot of my PanOperator log. I see the exit code from Pan is ALWAYS 0 (Success). I've also checked this online and I see the fix mentioned here (https://jira.pentaho.com/browse/PDI-13952) is already part of my spoon.sh.
I use PDI Pan - Kettle version 6.0.1.0-386 on RHEL 6.9.
Thanks!
Hi @piffall ,
Adding on to my previous comment, I've now modified spoon.sh & pan.sh files on PDI server to return non 0 codes upon failure. This is also printed in the log (highlighted) for reference. However, the kettle.py is still reading the status code=0 even though pan.sh returns 1. Not sure where is the disconnect.
Hi @piffall ,
I think I figured out the problem. Here are the takeaways from this:
- Ensure spoon.sh file on the PDI server checks for the status of the complete execution and not just the last command executed. Reference: https://jira.pentaho.com/browse/PDI-14658
- Never have the echo statements in the last line of pan.sh/kitchen.sh on PDI server. kettle.py file of the plugin doesn't understand if the pan.sh has failed and the status will be set to 0. In my case I had added
echo "Pan Exit Code: " $?
line in the pan.sh file to debug the status in spoon > pan > kettle. Hence kettle was treating the status=0 though pan had failed. I removed this line from pan.sh and it started to work now (image below)
It has already failed and marked for retry as expected:
Thanks!
Ok, @pkashibh . As the issue seems to reside on an old version of PDI, I close it. Thanks.