Confused how are you displaying accuracy_score for LogisticRegression / DecisionTree by just printing the logs even when in the code you are storing it into a file

Question

Confused how are you displaying accuracy_score for LogisticRegression / DecisionTree by just printing the logs even when in the code you are storing it into a file

rituraj17 opened this issue 3 years ago · comments

First of all Awesome Tutorial and blog.!!!

I am currently new to kubeflow pipelines. So wanted to know how are you storing accuracy_score for LogisticRegression /DecisionTree in logs as even though in the code you are storing it into a file.

File path : decision_tree/decision_tree.py

# Get accuracy
accuracy = accuracy_score(y_test, y_pred)

# Save output into file
with open(args.accuracy, 'w') as accuracy_file:
    accuracy_file.write(str(accuracy))

File path :pipeline.py

 show_results(decision_tree_task.output, logistic_regression_task.output)

I know that you are printing the output by using the show_results() function.
But before this step how are you getting the "decision_tree_task.output "value as it should be a file right?
Shouldn’t we read the file and then print the output?

Fernando · Answer 1 · Mon Jun 07 2021 08:53:49 GMT+0800 (China Standard Time)

Hi @FernandoLpz ,

First of all Awesome Tutorial and blog.!!!

I am currently new to kubeflow pipelines. So wanted to know how are you storing accuracy_score for LogisticRegression /DecisionTree in logs as even though in the code you are storing it into a file.

File path : decision_tree/decision_tree.py
# Get accuracy
accuracy = accuracy_score(y_test, y_pred)

# Save output into file
with open(args.accuracy, 'w') as accuracy_file:
    accuracy_file.write(str(accuracy))
File path :pipeline.py
 show_results(decision_tree_task.output, logistic_regression_task.output)
I know that you are printing the output by using the show_results() function.
But before this step how are you getting the "decision_tree_task.output "value as it should be a file right?
Shouldn’t we read the file and then print the output?

Fernando · Answer 2 · Mon Jun 07 2021 09:28:31 GMT+0800 (China Standard Time)

Hi @rituraj17 ,

When a component has a single output value (in this case decision_tree_task only has accuracy as its output value), the value is saved as a "string", "float", etc. as the case may be. It is for the reason that I do not need to read the file and I only extend the "output" attribute, just like: decision_tree_task.output.

In case you have multiple outputs, the "output" attribute would be a dict where "key" would be the name of the variable and "value" the value. For example: decision_tree_task.output['accuracy'], decision_tree_task.output['precision'], etc.

It is important to mention that the output attribute can have different types of data, this specification is made in the component's yaml manifest. For example, for decision_tree () the accuracy is read as a float, not as a file:
outputs:- {name: Accuracy, type: Float, description: 'Accuracy metric'}

Also, it is important to note that within the decision_tree.py script the accuracy metric is stored in a file, however the specification in the manifest says that it will be implemented as a float.

Let me know if you have any other doubt! 🙂