raspberrypi / picamera2

New libcamera based python library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] picamera2 randomly stops feeding frames while recording (imx519)

zelo opened this issue · comments

Describe the bug
After starting recording camera starts feeding frames but after some random time (sometimes its 5s the other 90s) camera stops feeding the frames. The process continues to work, just no frames are produced. There is also no possibility to do request capture when this happens.

Event loop is running, when executed in debugger and paused. There are 3 threads looping and doing nothing interesting.
V4L2Encoder.thread_poll - events are empty
CameraManager.listen - events are empty
NullPreview.listen - events are empty

I also tried to look in logs but there was nothing interesting even with echo 3 > /sys/module/videobuf2_common/parameters/debug

Update: seems like lowering buffer_count from 3 to 2 makes this working.
Update2: Nope, that still happens, just take more time
Update3: Now it somehow works with buffer_count 3 for 2 hours already.
Update4: Works with buffer_count 3 for 10 hours already.

  • I have replugged camera
  • put it in metal case with thermopads

I still wonder what is the cause of that problem and why there is no trace in logs. If I would blame temps then without case and thermopads temps on cpu/gpu was reported around 60 celcius, which doesn't seem too big. If connection was the issue then after restarting the script everything worked properly without touching the hardware.

To Reproduce
Run this:

import logging
from datetime import datetime
from time import sleep

from picamera2 import Picamera2
from picamera2.encoders import MJPEGEncoder
from picamera2.outputs import Output


class DummyOutput(Output):
    def outputframe(self, frame, keyframe=True, timestamp=None):
        print(f"Got frame: {datetime.now()}")


c = Picamera2()
c.set_logging(level=logging.DEBUG)
cfg = c.create_still_configuration(
    lores={
        "size": (1984, 1500),
        "format": "YUV420",
    },
    raw=None,
    buffer_count=3,
    queue=False,
)
c.configure(cfg)
c.start(show_preview=False)
e = MJPEGEncoder()
o = DummyOutput()
c.start_encoder(
    encoder=e,
    output=o,
    name="lores",
)

try:
    while True:
        sleep(1)
        print(f"Encoder running: {e.running}")
except Exception as e:
    raise e
finally:
    c.stop()

Expected behaviour
Infinite time recording

Console Output, Screenshots

[0:35:26.351156055] [1056]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:35:26.389560072] [1059]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:35:26.391296340] [1059]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx519@1a to Unicam device /dev/media4 and ISP device /dev/media1
[0:35:26.391399339] [1059]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:35:26.393692919] [1056]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:35:26.431342107] [1062]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:35:26.433097263] [1062]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx519@1a to Unicam device /dev/media4 and ISP device /dev/media1
[0:35:26.433166966] [1062]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
picamera2.picamera2 DEBUG: Requesting configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4656, 3496), 'stride': 14016, 'framesize': 48999936}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4656, 3496)}, 'display': None, 'encode': None}
[0:35:26.440347202] [1056]  INFO Camera camera.cpp:1183 configuring streams: (0) 4656x3496-BGR888 (1) 1984x1500-YUV420
[0:35:26.440779089] [1062]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx519@1a - Selected sensor format: 4656x3496-SRGGB10_1X10 - Selected unicam format: 4656x3496-pRAA
picamera2.picamera2 INFO: Configuration successful!
picamera2.picamera2 DEBUG: Final configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4656, 3496), 'stride': 14016, 'framesize': 48999936}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4656, 3496)}, 'display': None, 'encode': None}
picamera2.picamera2 DEBUG: Streams: {'main': <libcamera._libcamera.Stream object at 0x7fac1c8c30>, 'lores': <libcamera._libcamera.Stream object at 0x7f91f20db0>, 'raw': None}
picamera2 DEBUG: Allocated 3 buffers for stream 0 with fds [26, 29, 32]
picamera2 DEBUG: Allocated 3 buffers for stream 1 with fds [35, 38, 41]
picamera2.picamera2 INFO: Camera started
Got frame: 2024-04-07 17:20:11.886097
Encoder running: True
Got frame: 2024-04-07 17:20:11.995518
Got frame: 2024-04-07 17:20:12.105972
Got frame: 2024-04-07 17:20:12.228983
Got frame: 2024-04-07 17:20:12.347904
Got frame: 2024-04-07 17:20:12.468742
Got frame: 2024-04-07 17:20:12.590001
Got frame: 2024-04-07 17:20:12.711057
Got frame: 2024-04-07 17:20:12.833490
[...]
Encoder running: True
Got frame: 2024-04-07 17:20:55.085381
Got frame: 2024-04-07 17:20:55.207543
Got frame: 2024-04-07 17:20:55.325749
Got frame: 2024-04-07 17:20:55.444800
Got frame: 2024-04-07 17:20:55.565147
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
[...]

Hardware :

Raspberry PI 4
Arducam IMX519

Additional context
in /boot/firmware/config.txt i have added:

[all]
dtoverlay=imx519
dtoverlay=vc4-kms-v3d,cma-512
apt-cache show python3-picamera2
Package: python3-picamera2
Source: picamera2
Version: 0.3.17-1

output of dmesg with debug enabled

[66084.117396] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 0, plane 0 offset 0x00000000
[66084.125943] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 1, plane 0 offset 0x0136b000
[66084.134110] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 2, plane 0 offset 0x026d6000
[66084.142244] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 3, plane 0 offset 0x03a41000
[66084.142257] videobuf2_common: [cap-00000000865065ca] __vb2_queue_alloc: allocated 4 buffers, 1 plane(s) each
[66084.142355] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 0, plane 0 exported as 45 descriptor
[66084.142387] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 1, plane 0 exported as 46 descriptor
[66084.142408] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 2, plane 0 exported as 47 descriptor
[66084.142426] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 3, plane 0 exported as 48 descriptor
[66084.142441] videobuf2_common: [cap-00000000865065ca] vb2_core_reqbufs: memory in use, orphaning buffers
[66084.142447] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 0
[66084.142450] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 1
[66084.142453] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 2
[66084.142456] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 3
[66084.142501] videobuf2_common: [cap-00000000865065ca] __vb2_queue_alloc: allocated 32 buffers, 1 plane(s) each
[66084.142695] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 0, plane 0 offset 0x00000000
[66084.142753] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 1, plane 0 offset 0x00005000
[66084.142806] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 2, plane 0 offset 0x0000a000
[66084.142857] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 3, plane 0 offset 0x0000f000
[66084.142908] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 4, plane 0 offset 0x00014000
[66084.142958] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 5, plane 0 offset 0x00019000
[66084.143010] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 6, plane 0 offset 0x0001e000
[66084.143062] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 7, plane 0 offset 0x00023000
[66084.143113] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 8, plane 0 offset 0x00028000
[66084.143163] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 9, plane 0 offset 0x0002d000
[66084.143214] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 10, plane 0 offset 0x00032000
[66084.143268] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 11, plane 0 offset 0x00037000
[...]
[66091.780551] videobuf2_common: [out-00000000f55d31ce] __prepare_dmabuf: buffer for plane 0 changed
[66091.781056] videobuf2_common: [out-00000000f55d31ce] vb2_core_qbuf: qbuf of buffer 1 succeeded
[66091.800610] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66091.800626] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 2, state: dequeued
[66091.800803] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66091.800812] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 6, state: dequeued
[66091.911648] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66091.911666] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 3, state: dequeued
[66091.911812] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66091.911822] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 7, state: dequeued
[66092.022738] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66092.022757] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 0, state: dequeued
[66092.022909] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.022918] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 8, state: dequeued
[66092.133809] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66092.133824] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 1, state: dequeued
[66092.133952] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.133962] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 9, state: dequeued
[66092.244881] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.244895] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 10, state: dequeued
[66092.355953] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.355969] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 11, state: dequeued
[66092.467037] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.467053] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 0, state: dequeued
[66092.578112] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.578127] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 1, state: dequeued
[66092.689190] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.689205] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 2, state: dequeued
[66092.800264] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.800280] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 3, state: dequeued
[66092.911338] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.911354] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 4, state: dequeued
[66093.022420] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66093.022439] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 5, state: dequeued

Hi, thanks for the report, and for including a straightforward script that demonstrates the problem.

I'll try that on a Pi 4 and see if I get the same problems. Could you confirm that you're running the latest OS (kernel should be 6.6) just so that we can be sure we're running the same thing? I also don't have access to an imx519. Can you also confirm that you're running the standard versions of libcamera and Picamera2 distributed by Raspberry Pi (sometimes 3rd party cameras use custom software which we can't run). Finally, do you have an official Raspberry Pi camera that you know shows the same problem, then I can use that instead? Thanks.

Thanks for your time.
yes, I'm running standard version. Deliberately haven't installed arducam software.

My software stack is:

  • 2024-03-15-raspios-bookworm-arm64-lite.img.xz
  • python3-picamera2: 0.3.17-1 (installed trough apt from official sources)
  • python3-libcamera: 0.2.0+rpt20240215-1 (installed trough apt from official sources)

system is running headless, there are no addidtional processes except ssh sessions and python script

zelo@pi:~ $ uname -a
Linux pi 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux
zelo@pi:~ $ vcgencmd bootloader_version
2024/01/22 10:41:21
version 51ed67b03b3dde4e76b345370f312d07aabf45b8 (release)
timestamp 1705920081
update-time 1707078372
capabilities 0x0000007f

I believe that putting it in case with heatsink and replugging camera in the process somehow fixed the problem.
Weird thing is that it just stopped producing frames without any trace I could notice and after restarting camera stack it worked for a while as it should. I also tried restarting encoder only but that did not make any difference.

Another thought i have is that maybe it's caused by some sort of interference. Temps looked OK without case, except hw mpeg encoder and memory there was no load. the case is metalic and may act as a shield.

Finally, do you have an official Raspberry Pi camera that you know shows the same problem, then I can use that instead

I will try to reproduce the issue on one of official ones later this week

I'm running your script now and will leave it for a day or so. I'll let you know if anything happens!

Actually the problem was not fixed. For test i tried with dummy encoder and had this in code and this runned entire night 🤦

# e = MJPEGEncoder()
e = Encoder()
o = DummyOutput()
c.start_encoder(
    encoder=e,
    output=o,
    name="lores",
)

when i went back to mjpeg encoder it occured again

I've left it running overnight but it still seems quite happy. I'll try again with the latest Raspberry Pi OS Lite. Can you say what Raspberry Pi camera you might be able to test with yourself? Then I can use the same one. Thanks!

@davidplowman I connected Pi Camera HD v3 12MPx and will let it run for some time. Will report tomorrow.

Edit: funny it just stopped after 2 minutes
one thing, i changed buffer count to make capture faster since its lower res. It hanged with bc=1/3/7/9 although with 1 (queue=true) it took 10 minutes, 1 (queue=false) 1 hour, with 3/7/9 (queue=true) around 2/3 minutes.

I was able to reproduce that behaviour on IMX519 (Arducam 16Mpx), Arducam Hawkeye, and Rpi Camera V3. I Have used 3 different ribbon cables as well.
I also tried switching power supply and even used lab one but that haven't changed my situation.
Is there possiblity that my pi is somehow damaged?

My setup looks like this:
IMG_20240412_231502_1

zelo@pi:~ $ python perun/isolate.py
[0:03:20.754635954] [1105]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:03:20.862707486] [1108]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:03:20.864842712] [1108]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media0
[0:03:20.864950648] [1108]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:03:20.867302172] [1105]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:03:21.006124476] [1111]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:03:21.008093648] [1111]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media0
[0:03:21.008171730] [1111]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
picamera2.picamera2 DEBUG: Requesting configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 7, 'queue': True, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1506), 'stride': 1984, 'framesize': 4481856}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
[0:03:21.015135774] [1105]  INFO Camera camera.cpp:1183 configuring streams: (0) 4608x2592-BGR888 (1) 1984x1506-YUV420
[0:03:21.015548203] [1111]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
picamera2.picamera2 INFO: Configuration successful!
picamera2.picamera2 DEBUG: Final configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 7, 'queue': True, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1506), 'stride': 1984, 'framesize': 4481856}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
picamera2.picamera2 DEBUG: Streams: {'main': <libcamera._libcamera.Stream object at 0x7f76d9d270>, 'lores': <libcamera._libcamera.Stream object at 0x7f76d9d2b0>, 'raw': None}
picamera2 DEBUG: Allocated 7 buffers for stream 0 with fds [26, 29, 32, 35, 38, 41, 44]
picamera2 DEBUG: Allocated 7 buffers for stream 1 with fds [47, 50, 53, 56, 59, 62, 65]
picamera2.picamera2 INFO: Camera started
Got frame: 2024-04-12 23:05:51.852856
Got frame: 2024-04-12 23:05:51.922536
Got frame: 2024-04-12 23:05:51.992129
Got frame: 2024-04-12 23:05:52.060676
Got frame: 2024-04-12 23:05:52.131400
Got frame: 2024-04-12 23:05:52.199579
picamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f76db0b10>
[...]
picamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f76db12d0>
Got frame: 2024-04-12 23:07:32.051924
Got frame: 2024-04-12 23:07:32.062610
Got frame: 2024-04-12 23:07:32.206700
Got frame: 2024-04-12 23:07:32.211447
Got frame: 2024-04-12 23:07:32.284496
Got frame: 2024-04-12 23:07:32.325323
Got frame: 2024-04-12 23:07:32.391842
Got frame: 2024-04-12 23:07:32.462790
picamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f76db0c90>

I also tried pi V3 camera on pi zero w and it crashed twice with different errors:

  • first with Unable to handle kernel paging request at virtual address after 2 hours 11 minutes, script was still running after crash.
  • Second with Internal error: Oops - undefined instruction: 0 [#1] ARM run failed after 1 hour 53 minutes.

I have orderered pi 5 hopefully that will work

    buffer_count=3,
    queue=True,
zelo@pi:~ $ uname -a
Linux pi 6.1.21+ #1642 Mon Apr  3 17:19:14 BST 2023 armv6l GNU/Linux

zelo@pi:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)"
NAME="Raspbian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

First run error

[ 1021.766654] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[ 1048.067692] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[ 1066.975680] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[ 8942.369300] 8<--- cut here ---
[ 8942.369383] Unable to handle kernel paging request at virtual address bfa89b9c
[ 8942.369422] [bfa89b9c] *pgd=00000000
[ 8942.369463] Internal error: Oops: 80000005 [#1] ARM
[ 8942.369494] Modules linked in: cmac algif_hash aes_arm aes_generic ecb algif_skcipher af_alg bnep imx708 dw9807_vcm hci_uart btbcm bluetooth ecdh_generic ecc libaes brcmfmac vc4 8021q garp stp llc brcmutil snd_soc_hdmi_codec sha256_generic libsha256 drm_display_helper cfg80211 cec drm_dma_helper drm_kms_helper snd_soc_core bcm2835_unicam bcm2835_codec(C) snd_compress v4l2_dv_timings snd_pcm_dmaengine bcm2835_v4l2(C) bcm2835_isp(C) syscopyarea sysfillrect v4l2_fwnode v4l2_mem2mem raspberrypi_hwmon i2c_mux_pinctrl sysimgblt i2c_mux v4l2_async bcm2835_mmal_vchiq(C) fb_sys_fops rfkill videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_bcm2835(C) videodev snd_pcm snd_timer i2c_bcm2835 snd mc vc_sm_cma(C) uio_pdrv_genirq fixed uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 8942.369986] CPU: 0 PID: 1906 Comm: python Tainted: G         C         6.1.21+ #1642
[ 8942.370025] Hardware name: BCM2835
[ 8942.370045] PC is at 0xbfa89b9c
[ 8942.370078] LR is at video_usercopy+0x384/0x574 [videodev]
[ 8942.370834] pc : [<bfa89b9c>]    lr : [<bf24db90>]    psr: 60000013
[ 8942.370861] sp : dc905e50  ip : c6fc8000  fp : c0c2f02c
[ 8942.370884] r10: b07fc4ec  r9 : 00000000  r8 : c511e200
[ 8942.370907] r7 : dc905e74  r6 : 00000000  r5 : c044560f  r4 : c050560f
[ 8942.370932] r3 : 0000003c  r2 : bf26f240  r1 : e2bb8a4f  r0 : c3344c78
[ 8942.370956] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 8942.370986] Control: 00c5387d  Table: 0248c008  DAC: 00000055
[ 8942.371008] Register r0 information: slab kmalloc-1k start c3344c00 pointer offset 120 size 1024
[ 8942.371072] Register r1 information: non-paged memory
[ 8942.371103] Register r2 information: 59-page vmalloc region starting at 0xbf246000 allocated at load_module+0x888/0x1d04
[ 8942.371162] Register r3 information: non-paged memory
[ 8942.371191] Register r4 information: non-slab/vmalloc memory
[ 8942.371220] Register r5 information: non-slab/vmalloc memory
[ 8942.371248] Register r6 information: NULL pointer
[ 8942.371273] Register r7 information: 2-page vmalloc region starting at 0xdc904000 allocated at kernel_clone+0xac/0x31c
[ 8942.371337] Register r8 information: slab kmalloc-64 start c511e200 pointer offset 0 size 64
[ 8942.371391] Register r9 information: NULL pointer
[ 8942.371418] Register r10 information: non-paged memory
[ 8942.371445] Register r11 information: non-slab/vmalloc memory
[ 8942.371473] Register r12 information: slab task_struct start c6fc8000 pointer offset 0
[ 8942.371523] Process python (pid: 1906, stack limit = 0x440e1ba1)
[ 8942.371560] Stack: (0xdc905e50 to 0xdc906000)
[ 8942.371590] 5e40:                                     dc905e73 c0052d60 c48fc000 0000003c
[ 8942.371625] 5e60: b5063d40 dc905eb4 00000001 0000003c 00006590 00000002 00000009 00000000
[ 8942.371663] 5e80: 00004003 00000001 00000000 00001ec2 00000000 00032bc8 00000000 00000000
[ 8942.371698] 5ea0: 00000000 00000000 00000000 00000000 00000001 c511e200 00000001 00000000
[ 8942.371736] 5ec0: 00000000 c1ba9040 c4942a80 00000000 dc905f2c c08c6378 00000026 00000000
[ 8942.371772] 5ee0: c0c2f02c dc905f78 0eb14990 00000822 00000000 e2bb8a4f dc905f80 c044560f
[ 8942.371807] 5f00: c48fc001 c48fc000 c0c2f02c b07fc4ec 00000000 00000033 c3347770 c022c198
[ 8942.371844] 5f20: fffffe30 5ac3c35a dc905f3c c08c6a24 00000002 dc905fb0 c0c2f02c c001272c
[ 8942.371879] 5f40: 00000000 00000000 00000000 e2bb8a4f 00000001 c6d6f961 c6d6f960 c0c2f02c
[ 8942.371914] 5f60: 016eb9d0 00000026 c6fc8000 c0216e88 c6d6f961 c0216e88 00000000 e2bb8a4f
[ 8942.371952] 5f80: 001e73e8 00000044 01a60520 00000033 00000036 c0008408 c6fc8000 00000000
[ 8942.371989] 5fa0: 01a60520 c0008260 00000044 01a60520 00000033 c044560f b07fc4ec 00000000
[ 8942.372025] 5fc0: 00000044 01a60520 00000033 00000036 00000001 b3da2188 00000000 01a60520
[ 8942.372062] 5fe0: 00479320 b07fc4ac 000736a0 b6d3056c 60000010 00000033 00000000 00000000
[ 8942.372116]  video_usercopy [videodev] from sys_ioctl+0x10c/0x90c
[ 8942.372732]  sys_ioctl from ret_fast_syscall+0x0/0x1c
[ 8942.372787] Exception stack(0xdc905fa8 to 0xdc905ff0)
[ 8942.372824] 5fa0:                   00000044 01a60520 00000033 c044560f b07fc4ec 00000000
[ 8942.372864] 5fc0: 00000044 01a60520 00000033 00000036 00000001 b3da2188 00000000 01a60520
[ 8942.372898] 5fe0: 00479320 b07fc4ac 000736a0 b6d3056c
[ 8942.372934] Code: bad PC value
[ 8942.372962] ---[ end trace 0000000000000000 ]---
[ 8942.372985] note: python[1906] exited with irqs disabled

Second run error

[ 7195.713400] Internal error: Oops - undefined instruction: 0 [#1] ARM
[ 7195.713519] Modules linked in: cmac algif_hash aes_arm aes_generic ecb algif_skcipher af_alg bnep imx708 dw9807_vcm hci_uart btbcm bluetooth brcmfmac ecdh_generic ecc libaes vc4 brcmutil sha256_generic libsha256 snd_soc_hdmi_codec 8021q garp stp drm_display_helper llc cec cfg80211 drm_dma_helper drm_kms_helper snd_soc_core bcm2835_unicam snd_compress v4l2_dv_timings snd_pcm_dmaengine bcm2835_codec(C) syscopyarea sysfillrect v4l2_fwnode sysimgblt fb_sys_fops i2c_mux_pinctrl v4l2_async bcm2835_isp(C) v4l2_mem2mem bcm2835_v4l2(C) i2c_mux raspberrypi_hwmon bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops rfkill videobuf2_v4l2 videobuf2_common snd_bcm2835(C) videodev snd_pcm snd_timer i2c_bcm2835 snd mc vc_sm_cma(C) uio_pdrv_genirq fixed uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 7195.714019] CPU: 0 PID: 753 Comm: python Tainted: G         C         6.1.21+ #1642
[ 7195.714060] Hardware name: BCM2835
[ 7195.714081] PC is at linux_banner+0xc7a78/0x1dd784
[ 7195.714146] LR is at handle_mm_fault+0x67c/0xd94
[ 7195.714204] pc : [<c0a0a1f4>]    lr : [<c01ce4f8>]    psr: a0000113
[ 7195.714228] sp : dc905ec8  ip : c001cf7c  fp : cb96267c
[ 7195.714252] r10: 00000000  r9 : dc905fb0  r8 : c0c2f02c
[ 7195.714273] r7 : 00000000  r6 : 00000000  r5 : ab7d0000  r4 : 00000000
[ 7195.714297] r3 : 022e538c  r2 : c52b2080  r1 : 00000000  r0 : c52c7f40
[ 7195.714321] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 7195.714350] Control: 00c5387d  Table: 027ac008  DAC: 00000055
[ 7195.714373] Register r0 information: non-slab/vmalloc memory
[ 7195.714408] Register r1 information: NULL pointer
[ 7195.714433] Register r2 information: slab task_struct start c52b2080 pointer offset 0
[ 7195.714487] Register r3 information: non-paged memory
[ 7195.714514] Register r4 information: NULL pointer
[ 7195.714539] Register r5 information: non-paged memory
[ 7195.714564] Register r6 information: NULL pointer
[ 7195.714590] Register r7 information: NULL pointer
[ 7195.714615] Register r8 information: non-slab/vmalloc memory
[ 7195.714643] Register r9 information: 2-page vmalloc region starting at 0xdc904000 allocated at kernel_clone+0xac/0x31c
[ 7195.714709] Register r10 information: NULL pointer
[ 7195.714736] Register r11 information: non-slab/vmalloc memory
[ 7195.714765] Register r12 information: non-slab/vmalloc memory
[ 7195.714793] Process python (pid: 753, stack limit = 0x2fa239ec)
[ 7195.714829] Stack: (0xdc905ec8 to 0xdc906000)
[ 7195.714861] 5ec0:                   c6fe5c40 c336f8c0 dc905f2c c08c6378 00000000 0966734f
[ 7195.714900] 5ee0: ab7d0000 cb96267c ffffffff c113f750 00000cc0 000ab7d0 ab7d0000 ab7d0000
[ 7195.714935] 5f00: 00000a55 c27aead8 c27aead8 00000000 00000000 00000000 c52c7740 c6fe5c70
[ 7195.714971] 5f20: 00000000 541df672 00000817 dc905fb0 c6fe5c40 ab7d0000 00000817 00000255
[ 7195.715007] 5f40: c113f750 00000002 c6fe5c70 c08cd358 dc905f74 c01b2f4c 00000022 000ab3a4
[ 7195.715044] 5f60: dc905f70 c0c34bfc 00000817 c08cd220 ab7d0000 dc905fb0 5b59515b b554aa40
[ 7195.715080] 5f80: b1a49fd0 c00189fc ab3a4008 b6f88b44 20000010 ffffffff b6f88b44 20000010
[ 7195.715117] 5fa0: ffffffff c52b2080 00c5387d c0008f94 ab7d0000 ad9fd008 01dfff78 5b58515b
[ 7195.715156] 5fc0: 515b5851 57515b57 5c57515b 515b5851 58515b58 5b59515b b554aa40 b1a49fd0
[ 7195.715194] 5fe0: 515c5951 b18fdbec 00000058 b6f88b44 20000010 ffffffff 00000000 00000000
[ 7195.715243]  handle_mm_fault from do_page_fault+0x138/0x3a8
[ 7195.715332]  do_page_fault from do_DataAbort+0x40/0xb4
[ 7195.715398]  do_DataAbort from __dabt_usr+0x54/0x60
[ 7195.715445] Exception stack(0xdc905fb0 to 0xdc905ff8)
[ 7195.715476] 5fa0:                                     ab7d0000 ad9fd008 01dfff78 5b58515b
[ 7195.715513] 5fc0: 515b5851 57515b57 5c57515b 515b5851 58515b58 5b59515b b554aa40 b1a49fd0
[ 7195.715548] 5fe0: 515c5951 b18fdbec 00000058 b6f88b44 20000010 ffffffff
[ 7195.715583] Code: 73c5cb76 68737213 70f86db3 5fe469a4 (f6c517fa)
[ 7195.715613] ---[ end trace 0000000000000000 ]---
[ 7195.715636] note: python[753] exited with irqs disabled

Can you confirm that a fresh and unchanged install of Raspberry Pi OS Lite (with all updates applied) produces an error from the script posted above? If so, we can try to reproduce that.

@davidplowman
Yes, I just did that

Below are ALL steps i did

  • Raspberry Pi Imager (Pi OS Lite (64 bit)) with following customizations applied: user/password, wifi, ssh
  • boot
  • ssh to pi
  • sudo apt-get update && sudo apt-get install python3-picamera2
  • python isolate.py (copy paste from my first comment here)
zelo@pi:~ $ libcamera-hello --list-cameras
Available cameras
-----------------
0 : imx708 [4608x2592 10-bit RGGB] (/base/soc/i2c0mux/i2c@1/imx708@1a)
    Modes: 'SRGGB10_CSI2P' : 1536x864 [120.13 fps - (768, 432)/3072x1728 crop]
                             2304x1296 [56.03 fps - (0, 0)/4608x2592 crop]
                             4608x2592 [14.35 fps - (0, 0)/4608x2592 crop]
zelo@pi:~ $ rpi-eeprom-update
BOOTLOADER: up to date
   CURRENT: Mon 22 Jan 10:41:21 UTC 2024 (1705920081)
    LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
   RELEASE: default (/lib/firmware/raspberrypi/bootloader-2711/default)
            Use raspi-config to change the release.

  VL805_FW: Using bootloader EEPROM
     VL805: version unknown. Try sudo rpi-eeprom-update
   CURRENT:
    LATEST:
zelo@pi:~ $ vcgencmd version
Feb 29 2024 12:24:53
Copyright (c) 2012 Broadcom
version f4e2138c2adc8f3a92a3a65939e458f11d7298ba (clean) (release) (start)
zelo@pi:~ $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
zelo@pi:~ $ python isolate.py
[0:04:05.064977470] [4510]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:04:05.188209832] [4515]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:04:05.190341572] [4515]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media2
[0:04:05.190458827] [4515]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:04:05.193241396] [4510]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:04:05.316364799] [4518]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:04:05.318255010] [4518]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media2
[0:04:05.318330266] [4518]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
picamera2.picamera2 DEBUG: Requesting configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
[0:04:05.325661008] [4510]  INFO Camera camera.cpp:1183 configuring streams: (0) 4608x2592-BGR888 (1) 1984x1500-YUV420
[0:04:05.326110511] [4518]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
picamera2.picamera2 INFO: Configuration successful!
picamera2.picamera2 DEBUG: Final configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
picamera2.picamera2 DEBUG: Streams: {'main': <libcamera._libcamera.Stream object at 0x7f71675130>, 'lores': <libcamera._libcamera.Stream object at 0x7f71674430>, 'raw': None}
picamera2 DEBUG: Allocated 3 buffers for stream 0 with fds [26, 29, 32]
picamera2 DEBUG: Allocated 3 buffers for stream 1 with fds [35, 38, 41]
picamera2.picamera2 INFO: Camera started
Got frame: 2024-04-15 11:16:07.553351
Got frame: 2024-04-15 11:16:07.623207
Got frame: 2024-04-15 11:16:07.690469
Got frame: 2024-04-15 11:16:07.763351
Got frame: 2024-04-15 11:16:07.830098
Got frame: 2024-04-15 11:16:07.899900
[...]
Encoder running: True
Got frame: 2024-04-15 11:16:43.976702
Got frame: 2024-04-15 11:16:44.136477
Got frame: 2024-04-15 11:16:44.151790
Got frame: 2024-04-15 11:16:44.323555
Got frame: 2024-04-15 11:16:44.333777
Got frame: 2024-04-15 11:16:44.500120
Got frame: 2024-04-15 11:16:44.515397
Got frame: 2024-04-15 11:16:44.680871
Got frame: 2024-04-15 11:16:44.696186
Got frame: 2024-04-15 11:16:44.851975
Got frame: 2024-04-15 11:16:44.866871
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
^Cpicamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f716779d0>
picamera2.picamera2 INFO: Camera stopped
Traceback (most recent call last):
  File "/home/zelo/isolate.py", line 38, in <module>
    sleep(1)
KeyboardInterrupt
picamera2.picamera2 DEBUG: Camera was not started
picamera2.picamera2 INFO: Camera closed successfully.

Thanks. Just so that I'm doing the right thing, is that running on a Pi 4? So it's Pi 4B + latest RPi OS Lite 64-bit + v3 camera + script you initially posted.

@davidplowman
Yes

  • pi 4b
  • latest 64 bit os lite (no apt upgrade, or dist-upgrade done)
  • v3 camera
  • short ribbon cable
  • script from first post
  • nothing connected to any other port
  • occurs in multiple locations (i moved the setup around appartment to check if its not some local interference)
  • I run the script remotely trough ssh by wifi
  • new sdcard SanDisk Extreme Pro microSD 64GB U3 A2
Revision        : b03114
Serial          : 10000000f9a62d1e
Model           : Raspberry Pi 4 Model B Rev 1.4

I'll try that. Note that I will do an apt update and full-upgrade after installing because that's the recommended procedure to get up-to-date software. From the output you've posted, can I assume that it typically fails within a few minutes?

Yes it usually fails pretty fast

Yes I'm seeing it too now. Normally fails in ~20ish seconds. As far as I can tell, we're queuing buffers to the encoder on the GPU but at some point it just stops returning them to us. No idea why.

In order to try and get some more debug, I put start_debug=1 into my /boot/firmware/config.txt (and rebooted) and now it just won't fail. And if I change it back to start_debug=0 it fails really quickly again. So all a bit baffling ATM...

i will try with start_debug=1

Update: For sure it's slower in obtaining frames.
I'm also pretty sure i tried with that flag before.
I see a lot of assert( vc_image_helper_get_sdram_page_size() == 4096 ) failed; ../../../../../helpers/vc_image/vc_image_helper.c::calculate_pitch line 869 in vclog not sure if that's a problem at all

it's great u were able to reproduce the issue

Almost 4 hours without hang (start_debug=1), although slower.

in that time in vclog
thousands of

assert( vc_image_helper_get_sdram_page_size() == 4096 ) failed; ../../../../../helpers/vc_image/vc_image_helper.c::calculate_pitch line 869

and few of

1241852.152: sdram: sdram refresh 1562->3124 (2)

and this in dmesg

[14495.963097] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[14495.963699] cma_alloc: 5 callbacks suppressed
[14495.963703] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964023] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964338] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964651] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964966] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.965329] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.965664] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.965978] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.966293] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.966609] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12

I left it going all night and it was still OK in the morning. And then, as I sat and watched it for just a moment, it failed. A few of those page size asserts, but I think those are spurious. Absolutely nothing else. I will spend some more time digging today.

I've been going back through older GPU firmware versions. As the failure is somewhat intermittent it's hard to be completely sure, but I believe the 6.6.18 kernel may be the last one that worked properly. You can downgrade yourself back to this kernel with

sudo rpi-update a31776f0abb4bee5b79a3334bf748c99a813c084

NOTE: always back up critical files before using rpi-update in case your system no longer boots afterwards!

If you are able to do so, it would be interesting to see if you agree that 6.6.18 is OK. In the meantime, I'll see if I can discover what might have gone wrong after that release.

Running test after kernel downgrade.

I also noticed that firmware updated to new version during command you provided

CURRENT: Mon 15 Apr 13:12:14 UTC 2024 (1713186734)

@davidplowman 15 hours without issue and keeps running

Mine ran all night too. I think I may have found the change in the GPU firmware that's starting causing this, only it's a system change that doesn't have anything to do with the MJPEG encoder. So there may be some subtle side-effects at play that need to be understood.

I think I've finally understood what's going on. In the end it's mainly down to a bug being introduced and released, and having more widespread effects than people perhaps realised. Whilst the bug has been fixed for a while, our "latest" images for download still don't have the fix.

As I said, kernel 6.6.18 with the associated GPU firmware is OK, the bug was introduced just after that. The first version that has the fix is 6.6.23 with its associated GPU firmware (the bug and fix are actually in the firmware, not the Linux kernel). Unfortunately our "latest" images, even after updating, are still shipping with 6.6.20 at this time.

So updating to any kernel/firmware from 6.6.23 onwards will work, and specifically the latest available version (which you can get with just sudo rpi-update) should be fine. (Same caveats apply as before when using rpi-update.)

I'm told our release images are likely to be updated in the next few days, and we'll make sure the working firmware is included. Hopefully that will resolve this issue for everyone!

One other unrelated thing I noticed about your script. I found the behaviour could be very "choppy", that is, frames weren't being processed steadily and smoothly. This can be caused by the GPU clocks not being increased enough when the Arm side of the processor doesn't think it's very busy. As a workaround, I found that adding isp_freq_min=500 into my /boot/firmware/config.txt make the situation significantly better.

That's great :) Thanks for your time diving in to the problem.

Just to recap, for anyone looking for solution. Either go back to kernel 6.6.18 (sudo rpi-update a31776f0abb4bee5b79a3334bf748c99a813c084) or wait for 6.6.23+ to be shipped and available for update

I guess this can be closed then.

One other unrelated thing I noticed about your script. I found the behaviour could be very "choppy", that is, frames weren't being processed steadily and smoothly. This can be caused by the GPU clocks not being increased enough when the Arm side of the processor doesn't think it's very busy. As a workaround, I found that adding isp_freq_min=500 into my /boot/firmware/config.txt make the situation significantly better.

Nice hint.

I encountered a similar problem which is probably caused by the same bug. After stripping down my code to a minimal example I also noticed the CameraManager and preview event loops only returning empty events after some seconds to minutes of runtime until restarting my python script. Unluckily the more unrelated code I removed from my script, the longer it took until it stopped receiving frames, making debugging slower.

Unfortunately I do not run my code on Raspberry Pi OS but on a custom build yocto system, which makes it impossible for others to reproduce, so I am quite happy to find this issue here, thank you! I am currently in the process of verifying different firmware and kernel versions and the problem seems to be gone (long time testing currently in progress) with a 6.1.x kernel and firmware from 2023-10.

I would now like to update to a 6.6.x kernel, but I am uncertain how to identify since which firmware version the bug is fixed. The firmware versioning is a little confusing for me.

The firmware is integrated in my system using the meta-raspberry yocto layer https://github.com/agherzan/meta-raspberrypi/blob/master/recipes-bsp/bootfiles/rpi-bootfiles.bb, which downloads the firmware from [1]. The rpi-update hashes refer to [2], but the hashes in the Raspberry Pi OS release notes refer to [1]. vcgencmd also lists a hash for the version, but this one seems to belong neither to [1] or [2].

A concrete example:
I integrated firmware 12af703dd07d7721c1f2f58c6f71d9fe66270838 (dated 2023-10-24) from [1]. After booting, I get

:~# dmesg | grep firmware
[    0.052106] raspberrypi-firmware soc:firmware: Attached to firmware from 2023-10-17T15:39:16, variant start
[    0.056116] raspberrypi-firmware soc:firmware: Firmware hash is 30f0c5e4d076da3ab4f341d88e7d505760b93ad7

Date is different, hash does neither belong to [1] or [2]. How can I check if this is really the firmware from [1] I chose?

  1. https://github.com/raspberrypi/firmware
  2. https://github.com/raspberrypi/rpi-firmware

@davidplowman Can you help me here? My main question is:

  • From which commit in [1] on is the bug fixed?

What I would like to know but didn't find documented:

  • Is there a bugtracker and/or changelog for the firmware?
  • What is the connection between [1], [2] and the hashes and dates printed by vcgencmd or dmesg | grep firmware?

Hi, I'm afraid I'm not too clear myself on these questions. @popcornmix Could you perhaps help us here?

As regards this particular bug, I believe it got broken in raspberrypi/rpi-firmware@545545d and was fixed in raspberrypi/rpi-firmware@285539f.

In the other repository, these look to correspond to raspberrypi/firmware@dc94391 and raspberrypi/firmware@45319db respoectively, though I'm not entirely clear how they relate to one another.

rpi-firmware is just a slightly cutdown and reorganised version of firmware repo, used by rpi-update (it reduces update time a little by excluding some files).

Any common files in the two will be binary identical (they are just copied and then committed).
Just look for a matching commits message and you should see identical firmware (the start*.elf/fixup*.dat files).

Some test reports:

The firmware in the affected build matches the range mentioned, the others are before and after the range. This seems like a good indication I am really affected by the same bug. The affected build stops outputting frames after a few minutes, the others ran for ~24h.

What I am still a little concerned about is that reproducing the bug is too unpredictable. The time until fail differs in each run and is greatly affected by the rest of the python code, in my case the time increases from a few minutes to up to 1-2h the more code I remove. Thus I struggle a bit with determining the testing time to be sure the bug is gone.

@davidplowman With your insights on this bug, can you think of any method to reproduce the fail in a faster and reproducible way?