Issues with logging and dbt 1.4 (protobuf)
simonrobertsson opened this issue · comments
First check
- I added a descriptive title to this issue.
- I used the GitHub search to find a similar issue and didn't find it.
- I searched the Prefect documentation for this issue.
- I checked that this issue is related to Prefect and not one of its dependencies.
Bug summary
When running dbt tasks using prefect-dbt we started getting errors somewhat at random (after 100s of models ran) after updating dbt version to 1.4. The errors seems come from the logging either "AssertionError: feed_data after feed_eof" or "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 7998: invalid start byte". Rolling back to 1.3 of dbt resolves the issue. We suspect it has something to do with dbt changing it's logging to protobuf (https://github.com/dbt-labs/dbt-core/releases/tag/v1.4.0). The logs from dbt-core 1.4 are also printed in batch in Orion instead of one per line.
We searched through earlier issues with similar errors but this seems to have a different cause PrefectHQ/prefect#6335
Reproduction
dbt-core==1.4
Running dbt run via prefect_dbt.cli.commands.trigger_dbt_cli_command. A bigger run with 500+ models.
Error happens after a few hundered models have been ran.
Making the logging batches smaller by chaning PREFECT_LOGGING_ORION_BATCH_INTERVAL
PREFECT_LOGGING_ORION_BATCH_SIZE
PREFECT_LOGGING_ORION_MAX_LOG_SIZE
seemed to make the issue appear less frequently (more models ran before error appeared). The same for changes on the dbt side that made the logs smaller such as removing coloring.
Reverting to dbt-core==1.3 makes the issue go away.
Error
09:52:24.525 | ERROR | asyncio - Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'\x1b[0m09:5...m in 3.18s]\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'\x1b[0m09:5...m in 3.18s]\n')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
reader.feed_data(data)
File "/usr/local/lib/python3.10/asyncio/streams.py", line 456, in feed_data
assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
--
Encountered exception during execution:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1478, in orchestrate_task_run
result = await task.fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/prefect_dbt/cli/commands.py", line 158, in trigger_dbt_cli_command
result = await shell_run_command.fn(command=command, **shell_run_command_kwargs)
File "/usr/local/lib/python3.10/site-packages/prefect_shell/commands.py", line 89, in shell_run_command
async for text in TextReceiveStream(process.stdout):
File "/usr/local/lib/python3.10/site-packages/anyio/abc/_streams.py", line 31, in __anext__
return await self.receive()
File "/usr/local/lib/python3.10/site-packages/anyio/streams/text.py", line 44, in receive
decoded = self._decoder.decode(chunk)
File "/usr/local/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 7998: invalid start byte
Versions
(From local env, we have the same issue when running dockerized in Azure on python 3.10 with postgress db)
Version: 2.7.11
API version: 0.8.4
Python version: 3.8.10
Git commit: 6b27b476
Built: Thu, Feb 2, 2023 7:22 PM
OS/Arch: linux/x86_64
Profile: default
Server type: ephemeral
Server:
Database: sqlite
SQLite version: 3.31.1
Additional context
No response
Hi @simonrobertsson — thanks for the issue. I'm going to move this over to prefect-dbt
so our integrations team can take a look at it.
Hi @simonrobertsson thanks for reporting this! I have a new implementation of running dbt Core here.
I am curious if this error still occurs when you use the newer implementation?
Hi I wanted to check in whether the new implementation works for you?
Hey! Thanks for checking in. We have not had time to test the new implementation yet. But we just created a ticket to refactor our tasks to use DbtCoreOperation, so we can see if that works with dbt 1.4 then.
Hey!
Tested now and using DbtCoreOperation seems to have solved the issue. Thanks :)
Did anyone figure out what specifically fixed this in dbt-core? I see this same error (feed_data after feed_eof
) logged out when running scrapy (2.8.0). It doesn't stop the task and the process completes as normal, it's just log noise.
I'm also seeing this with dbt==1.4.6
and while using DbtCoreOperation
.