ploomber / ploomber

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

Home Page:https://docs.ploomber.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can´t disable logging using notebooks

rdataforge opened this issue · comments

I tested the logging feature using notebooks for train/test some models by adding this to the pipeline.yaml file

papermill_params:
    log_output: True

and called ploomber with this command:

ploomber build pipeline.yaml --log info --log-file experiment1.log

so far so good the log is printed to console and saved to file. If I open the notebook, the log is printed in every notebook cell as expected. Then I disabled the log removing the yaml entry and executing the pipeline this way:

ploomber build pipeline.yaml --log info

but now i am unable to diable log. All the messages are printed into every notebook cell making it nearly unusable when trying to see the cell output.

commented

I assume you followed this?
What are you trying to achieve? If you're trying to write it into a file you're still missing the --log--file some_file.log argument. This --log info sets the log level which explains why you're seeing it in the cell's outputs.
Also, might be faster to join our community slack and tackle it there!

yes, I removed the code block from the header of every notebook too.
I want no log at all but it still is polluting my cells. Prior to this, the cells were ok but log still printing to console when executing pipeline.

@rdataforge,

If I understand correctly, you're expecting that print statements in your notebook are logged to the console, but not to the notebook's cells. correct? However, what you're seeing is that ploomber build pipeline.yaml --log info logs to your console and to the notebook's cell.

If so, this is expected behavior. But I see your point, I think it'd make sense to have the option to remove the logs from the notebook's cells since you already have them in the console/log file.

yes @edublancas , in fact, after removing the --log info part , the log is still polluting my cells. I cannot get rid of it if was anytime active.
I had to remove any import logging and any attempt to create a custom log to avoid cluttering my cells.

from Slack:

  1. configure a pipeline.yaml just with one simple notebook output (one cell printing or plotting something will suffice). No log parameter.
  2. call ploomber build with no log option, just execute the notebook and plot.
  3. If you open the notebook you will see the plot rendered in the cell.
  4. Now add log option to yaml. Add logger call with custom message inside the cell (plot rendered or some simple message) and execute ploomber with logging to file (INFO is ok)
  5. as expected, log is printed to file and shown in cell
  6. Remove all to get back to the very same state as step 2 (except the custom log as you need to check messages sometimes output file)
  7. Open the notebook and you will see all logging output as in step 5.
  8. if you want to restore the previous behaviour with no log you MUST remove the custom log call to file from the cell. Otherwise you will end with a cluttered cell.

hey @rdataforge. I'm trying to reproduce this, but I'm having difficulty following the steps.

  1. this step is clear, I have a one-task pipeline, and the notebook prints some message (e.g., print("hello"))
  2. clear
  3. clear

4 - I'm stuck here

add log option to yaml

I'm assuming you mean, adding papermill_params.log_output: True to the task.

Add logger call with custom message inside the cell

does this imply adding another print statement, or using the logging module?

and execute ploomber with logging to file (INFO is ok)

I'm guessing ploomber build --log INFO --log-file some.log --force (forcing to ensure the notebook runs again)


It'd be better if you could provide a sample pipeline and a commented script so there's no ambiguity of what commands I need to run. Seems like the only manual parts are adding/deleting stuff to the pipeline.yaml/notebook

closing due to inactivity