fangfufu / Linux-Fake-Background-Webcam

Faking your webcam background under GNU/Linux, now supports background blurring, animated background, colour map effect, hologram effect and on-demand processing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mediapipe 0.8.6 segfaults after running for a couple minutes

darthmooguy opened this issue · comments

While running, I get 20-30 fps depending on the other workload of my CPU, which is pretty good! However, like in #105, I get segfaults after 4-5 minutes.

Here is my setup:
Fedora 33
Kernel 5.12.12-200.fc33.x86_64
CPU Intel i7-6700HQ (8) @ 3.500GHz
RAM 32GB
Latest commit of master branch of this repo (23ffc61)

To date, I have tried with v4l2loopback-dkms version 0.15.2 from this copr, and with the latest commit of the main branch of the v4l2loopback repo.

On the multithreaded branch I get better FPS (45), but it also segfaults.

Here is the output of the command during a Zoom call:

$ python3 fake.py -w /dev/video2 -v /dev/video6 --no-foreground
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
Running...
Please CTRL-C to pause and reload the background / foreground images
Please CTRL-\ to exit
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Consumers: 1
No consumers remaining, paused
Consumers: 1
No consumers remaining, paused
Consumers: 1
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
No consumers remaining, paused
Consumers: 1
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
No consumers remaining, paused
Consumers: 1
Consumers: 2
Consumers: 1
Consumers: 2
Consumers: 1
fish: Job 1, 'python3 fake.py -w /dev/video2…' terminated by signal SIGSEGV (Address boundary error)

It also segfaults using the --no-ondemand param.

How about 0b0321e? That's the commit I have right before adding on-demand processing. Does it segfault?

@DrDynamic's reply suggests problem with mediapipe

#116 (comment)

Could you both try updating Mediapipe to 0.8.6?

I'm already on 0.8.6 🤔

$ pip3 list | grep mediapipe
mediapipe             0.8.6

@darthmooguy , could you downgrade to 0.8.5 to see if it helps? You can always try compiling mediapipe from source, but that sounds extremely painful.

I've tried 0b0321e also, and I still get segfaults. However, I get them pretty much everytime i'm doing something CPU intensive (opening Android Studio, compiling an Android project in Android Studio...), and with 0b0321e it crashed both the fake.py and Zoom the first time I opened Zoom!

I'll try later today with mediapipe 0.8.5.

I am running now with mediapipe 0.8.5 which seems notably slower but no crashes so far on master...

@fangfufu With mediapipe 0.8.5 it seems stable on master! I was able to be in a zoom call for 15 minutes without segfaulting, and while compiling!

However, I get 15 fps and it sometimes drops to 7 fps (seems random), which is pretty slow, altough not a deal breaker.

Oh! This is very interesting!

@fangfufu Yesterday I tried compiling mediapipe myself but the Segmentation fault kept coming.

Now I'm running mediapipe 0.8.5 a Segmentation fault with another stacktrace:

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00000000004f5a2a in ?? ()
(gdb) backtrace
#0  0x00000000004f5a2a in ?? ()
#1  0x000000000056df6e in ?? ()
#2  0x000000000060bedb in ?? ()
#3  0x000000000060bc73 in ?? ()
#4  0x000000000060b1e4 in PyPegen_ASTFromStringObject ()
#5  0x00000000006140be in PyRun_StringFlags ()
#6  0x000000000061def2 in ?? ()
#7  0x000000000052de88 in ?? ()
#8  0x000000000051635b in _PyEval_EvalFrameDefault ()
#9  0x0000000000515377 in ?? ()
#10 0x000000000052d302 in _PyFunction_Vectorcall ()
#11 0x000000000051b3ac in _PyEval_EvalFrameDefault ()
#12 0x0000000000514a75 in ?? ()
#13 0x000000000052d302 in _PyFunction_Vectorcall ()
#14 0x000000000054015f in ?? ()
#15 0x00000000005177f3 in _PyEval_EvalFrameDefault ()
#16 0x0000000000514a75 in ?? ()
#17 0x000000000052d302 in _PyFunction_Vectorcall ()
#18 0x0000000000516543 in _PyEval_EvalFrameDefault ()
#19 0x000000000052d163 in _PyFunction_Vectorcall ()
#20 0x0000000000516543 in _PyEval_EvalFrameDefault ()
#21 0x000000000052d163 in _PyFunction_Vectorcall ()
#22 0x0000000000516543 in _PyEval_EvalFrameDefault ()
#23 0x000000000052d163 in _PyFunction_Vectorcall ()
#24 0x000000000051635b in _PyEval_EvalFrameDefault ()
#25 0x0000000000514a75 in ?? ()
#26 0x000000000051480b in _PyEval_EvalCodeWithName ()
#27 0x00000000005fb257 in PyEval_EvalCode ()
#28 0x00000000006205fb in ?? ()
#29 0x000000000061b724 in ?? ()
#30 0x000000000061fb2d in ?? ()
#31 0x000000000061f63a in PyRun_SimpleFileExFlags ()
#32 0x0000000000613527 in Py_RunMain ()
#33 0x00000000005ef7fd in Py_BytesMain ()
--Type <RET> for more, q to quit, c to continue without paging--
#34 0x00007ffff7c1b565 in __libc_start_main (main=0x5ef7c0, argc=6, argv=0x7fffffffde38, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffde28) at ../csu/libc-start.c:332
#35 0x00000000005ef6fe in _start ()

I have the same problem. Running it without the on-demand option helps a bit. But it still can happen. I'm using the v4l2loopback webcam and after crashing, I have to completely unload the module and load it again. Otherwise the application automatically crashes with an os error

@majuwa, @DrDynamic ,what OS are you on?

@majuwa, have you tried Akvcam?

@DrDynamic, could you try Python 3.7 please?

I'm working around occasional crashes with "if crash, restart" with Python 3.8 and mediapipe 0.8.5 - fortunately, in my case, it's only the script that segfaults:

while : ; do python ./fake.py ; if [ $? -eq 0 ] ; then break; fi ; sleep 1 ; done

(will try mp 0.8.6)

@Piskvor , I really dislike workarounds, if somebody can figure out why it crashes on their particular setups, it would be great.

I'm using Ubuntu 21.04. I haven't tried Akvcam so far. I will try it

I'm also on Ubuntu 21.04. And using Python 3.9.5.
Any tips how to run python3.7 on my System? It isn't in the repository anymore.

I'm also on Ubuntu 21.04. And using Python 3.9.5.
Any tips how to run python3.7 on my System? It isn't in the repository anymore.

I am using Python 3.8.10 - are you getting troubles with 3.9.5?

@DrDynamic It works for me on Python 3.9.5 and mediapipe 0.8.5 (on Fedora 33) 🤔

If you want to try with python3.7, you could do so with miniconda.

Running Ubuntu 20.04.2 LTS with python 3.8.10 and mediapipe 0.8.6 and I can confirm that while running it through discord the other night that it was crashing out every 15-20 minutes (sometimes longer, sometimes shorter) though sometimes was lasting a while longer.

I was just using background replacement (no foreground) and using the no-ondemand option (haven't actually been able to get discord to pick up the camera without disabling ondemand.

Rolling back to mediapipe 0.8.5 seems at least as stable if not more (not crashed yet after about 30 mins)

Looking at mediapipes github it could be related to this google-ai-edge/mediapipe#2250 but no news on investigating it yet.

Right I can confirm that I experience mediapipe 0.8.6 segfaulting.

I'm currently seeing the same behavior. I tried to put under load the machine (10+) and the issue happens much more frequently, in just under a minute you can see it.
When it happens, the output is the following:

./fake.py -v /dev/video0 -w /dev/video2 -b bg.jpg -i images --no-foreground

Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
False
Running...
Please CTRL-C to pause and reload the background / foreground images
Please CTRL-\ to exit
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
^C
Resuming, reloading background / foreground images...
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
fish: Job 1, './fake.py -v /dev/video0 -w /de…' terminated by signal SIGSEGV (Address boundary error)
`
mediapipe is 0.8.6, self-compiled so I'm able to recompile it with different settings, hopefully :)
Python 3.9.6
Linux 5.13.5
Gentoo

Let me know if more details or some specific actions can help.

Updated mediapipe to 0.8.6.2. It seems to work fine. Please comment if it doesn't work fine.

I just had this issue with 0.8.6.2.
This is the output from the console (I added a debug line printing mediapipe module version):

./fake.py -v /dev/video0 -w /dev/video2 -b bg.jpg -i images --no-foreground
0.8.6.2
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
False
Running...
Please CTRL-C to pause and reload the background / foreground images
Please CTRL-\ to exit
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
^C
Resuming, reloading background / foreground images...
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
fish: Job 1, './fake.py -v /dev/video0 -w /de…' terminated by signal SIGSEGV (Address boundary error)

For me replicating the issue is quite easy: start the app and have the machine installing some gentoo packages :)

@cova-fe , could you try 0.8.5 to see if the problem disappears?

@cova-fe , could you try 0.8.5 to see if the problem disappears?

Sure, just few minutes.

Interesting, I was able to recreate the issue also with 0.8.5, even though it seems that the needed load was higher.

./fake.py -v /dev/video0 -w /dev/video2 -b bg.jpg -i images --no-foreground
0.8.5
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
False
Running...
Please CTRL-C to pause and reload the background / foreground images
Please CTRL-\ to exit
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
^C
Resuming, reloading background / foreground images...
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
fish: Job 1, './fake.py -v /dev/video0 -w /de…' terminated by signal SIGSEGV (Address boundary error)

I guess I can investigate more deeply on where exactly the SIGSEGV happens.

Yes, please do investigate. Also, it seems turning off on-demand mode helps.

I was able to get a bt, not sure how useful could it be:
(python 3.9, mediapipe 0.8.6.2)
Resuming, reloading background / foreground images...
Real camera original values are set as: 640x480 with 30 FPS and video codec 1448695129
Real camera new values are set as: 1280x720 with 30 FPS and video codec 1196444237
[New Thread 0x7fffb663d640 (LWP 9412)]
[New Thread 0x7fffb5e3c640 (LWP 9413)]
[New Thread 0x7fffb563b640 (LWP 9414)]
[New Thread 0x7fffb4e3a640 (LWP 9415)]
[New Thread 0x7fffaffff640 (LWP 9416)]
[New Thread 0x7fffaeffd640 (LWP 9417)]
[New Thread 0x7fffae7fc640 (LWP 9418)]
FPS: 17.81
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff7cc6cd0 in _PyTrash_begin () from /usr/lib64/libpython3.9.so.1.0
(gdb) bt

#0 0x00007ffff7cc6cd0 in _PyTrash_begin () from /usr/lib64/libpython3.9.so.1.0
#1 0x00007ffff7cd6b49 in ?? () from /usr/lib64/libpython3.9.so.1.0
#2 0x00007ffff7c7ae7d in _PyObject_MakeTpCall () from /usr/lib64/libpython3.9.so.1.0
#3 0x00007ffff7c80822 in ?? () from /usr/lib64/libpython3.9.so.1.0
#4 0x00007ffff7c27c1d in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0
#5 0x00007ffff7d6148c in ?? () from /usr/lib64/libpython3.9.so.1.0
#6 0x00007ffff7c73e81 in _PyFunction_Vectorcall () from /usr/lib64/libpython3.9.so.1.0
#7 0x00007ffff7c807e8 in ?? () from /usr/lib64/libpython3.9.so.1.0
#8 0x00007ffff7c27c1d in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0
#9 0x00007ffff7d6148c in ?? () from /usr/lib64/libpython3.9.so.1.0
#10 0x00007ffff7c73e81 in _PyFunction_Vectorcall () from /usr/lib64/libpython3.9.so.1.0
#11 0x00007ffff7c285da in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0
#12 0x00007ffff7c20551 in ?? () from /usr/lib64/libpython3.9.so.1.0
#13 0x00007ffff7c285da in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0
#14 0x00007ffff7c20551 in ?? () from /usr/lib64/libpython3.9.so.1.0
#15 0x00007ffff7c285da in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0
#16 0x00007ffff7c20551 in ?? () from /usr/lib64/libpython3.9.so.1.0
#17 0x00007ffff7c2788e in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0
#18 0x00007ffff7d6148c in ?? () from /usr/lib64/libpython3.9.so.1.0
#19 0x00007ffff7d6199e in _PyEval_EvalCodeWithName () from /usr/lib64/libpython3.9.so.1.0
#20 0x00007ffff7d619eb in PyEval_EvalCodeEx () from /usr/lib64/libpython3.9.so.1.0
#21 0x00007ffff7d5c7bb in PyEval_EvalCode () from /usr/lib64/libpython3.9.so.1.0
#22 0x00007ffff7da74fe in ?? () from /usr/lib64/libpython3.9.so.1.0
#23 0x00007ffff7da8af1 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython3.9.so.1.0
#24 0x00007ffff7dc361a in Py_RunMain () from /usr/lib64/libpython3.9.so.1.0
#25 0x00007ffff7dc3c45 in Py_BytesMain () from /usr/lib64/libpython3.9.so.1.0
#26 0x00007ffff7a247fd in __libc_start_main () from /lib64/libc.so.6
#27 0x000055555555475a in _start ()

The SIGSEGV happens inside compose_frame(), when classifier is called on (frame).

def compose_frame(self, frame):
     frame.flags.writeable = False
     mask = self.classifier.process(frame).segmentation_mask`

Wow, how did you figure that out? Does it help if you get rid off the frame.flags.writeable lines? i.e. don't toggle the flag on and off, and make it writable the whole time.

Failing that, it might just be the problem with Mediapipe.

I used an horrible approach: sprinkled the code with debug print with the line number. I was able to bisect the single line in this way. It helped me that I can reproduce the crash in a minute or two. Let me try to remove the line you mentioned.

You might also want to remove frame.flags.writeable = true. The problem is that it doesn't crash locally for me.

So, last findings: removing the line frame.flags.writeable = False or setting it to True makes me unable to get other crashes, it seems you got the right suggestion. Not sure about the meaning of this flag, though, I have no knowledge about mediapipe :)
Let me try to remove the frame.flags.writable to see what happens.

Those flags were there to improve performance, as per https://google.github.io/mediapipe/solutions/selfie_segmentation.html

I removed those lines for now. Hopefully it works better now.

So, commenting out the frame.flags.writeable = true does not seem to help, as I was able to cause a crash. Now I'm running the app with the flag set to True, let's see if I can get crashes (so far none). If it is a timing issue, it could be tricky to pin down.

Unfortunately it seems that it happens again, albeit less frequently. Could it be something of interest for mediapipe developers?

@cova-fe , there was a ticket raised in Mediapipe's repository already, have a look at #127 (comment).

I just upgraded to Debian Bullseye. I get crashes even at 0.8.6.2. So I pinned MediaPipe version back to 0.8.5.

There is a GIL fix in the latest v0.8.7.1 python binaries, which may help resolve this issue. Try pip install mediapipe==0.8.7.1

google-ai-edge/mediapipe#2250 (comment)