PrefectHQ / prefect-dbt

Collection of Prefect integrations for working with dbt with your Prefect flows.

Home Page:https://prefecthq.github.io/prefect-dbt/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues with logging and dbt 1.4 (protobuf)

simonrobertsson opened this issue · comments

commented

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

When running dbt tasks using prefect-dbt we started getting errors somewhat at random (after 100s of models ran) after updating dbt version to 1.4. The errors seems come from the logging either "AssertionError: feed_data after feed_eof" or "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 7998: invalid start byte". Rolling back to 1.3 of dbt resolves the issue. We suspect it has something to do with dbt changing it's logging to protobuf (https://github.com/dbt-labs/dbt-core/releases/tag/v1.4.0). The logs from dbt-core 1.4 are also printed in batch in Orion instead of one per line.

We searched through earlier issues with similar errors but this seems to have a different cause PrefectHQ/prefect#6335

Reproduction

dbt-core==1.4

Running dbt run via prefect_dbt.cli.commands.trigger_dbt_cli_command. A bigger run with 500+ models.

Error happens after a few hundered models have been ran.

Making the logging batches smaller by chaning PREFECT_LOGGING_ORION_BATCH_INTERVAL
PREFECT_LOGGING_ORION_BATCH_SIZE
PREFECT_LOGGING_ORION_MAX_LOG_SIZE
seemed to make the issue appear less frequently (more models ran before error appeared). The same for changes on the dbt side that made the logs smaller such as removing coloring.

Reverting to dbt-core==1.3 makes the issue go away.

Error

09:52:24.525 | ERROR   | asyncio - Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'\x1b[0m09:5...m in 3.18s]\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'\x1b[0m09:5...m in 3.18s]\n')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
    reader.feed_data(data)
  File "/usr/local/lib/python3.10/asyncio/streams.py", line 456, in feed_data
    assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof

--

Encountered exception during execution:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1478, in orchestrate_task_run
    result = await task.fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect_dbt/cli/commands.py", line 158, in trigger_dbt_cli_command
    result = await shell_run_command.fn(command=command, **shell_run_command_kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect_shell/commands.py", line 89, in shell_run_command
    async for text in TextReceiveStream(process.stdout):
  File "/usr/local/lib/python3.10/site-packages/anyio/abc/_streams.py", line 31, in __anext__
    return await self.receive()
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/text.py", line 44, in receive
    decoded = self._decoder.decode(chunk)
  File "/usr/local/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 7998: invalid start byte

Versions

(From local env, we have the same issue when running dockerized in Azure on python 3.10 with postgress db)

Version:             2.7.11
API version:         0.8.4
Python version:      3.8.10
Git commit:          6b27b476
Built:               Thu, Feb 2, 2023 7:22 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.31.1

Additional context

No response

Hi @simonrobertsson — thanks for the issue. I'm going to move this over to prefect-dbt so our integrations team can take a look at it.

commented

Hi @simonrobertsson thanks for reporting this! I have a new implementation of running dbt Core here.

I am curious if this error still occurs when you use the newer implementation?

commented

Thanks @madkinsz! @ahuang11 we'll try it when we find a couple of hours :)

commented

Hi I wanted to check in whether the new implementation works for you?

commented

Hey! Thanks for checking in. We have not had time to test the new implementation yet. But we just created a ticket to refactor our tasks to use DbtCoreOperation, so we can see if that works with dbt 1.4 then.

commented

Hey!

Tested now and using DbtCoreOperation seems to have solved the issue. Thanks :)

Did anyone figure out what specifically fixed this in dbt-core? I see this same error (feed_data after feed_eof) logged out when running scrapy (2.8.0). It doesn't stop the task and the process completes as normal, it's just log noise.

commented

I'm also seeing this with dbt==1.4.6 and while using DbtCoreOperation.