jgarff / rpi_ws281x

Userspace Raspberry Pi PWM library for WS281X LEDs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RPi0W: SD card errors while running around 40 LEDs

pinheadmz opened this issue · comments

Rrunning the test sudo ./test -c produces SD card errors in dmesg:

[33490.769029] mmcblk0: error -110 transferring data, sector 137216, nr 16, cmd response 0x900, card status 0xc00

More verbose log here: https://pastebin.com/AZUijQmW

This error will pop up every few minutes if I leave the test running just by itself. With my other processes running (full project: https://github.com/pinheadmz/ClockJr) these errors pop up as frequently as every few seconds.

My project displays the "rainbow wheel" effect for a few seconds then stops and goes blank. A few seconds later I will see this mmcblk0 error in dmesg.

I've been through FOUR SD CARDS, all different brands. It's not the card.

Change the dma channel to 10.

The default dma channel 5 now clashes with recent OS versions

This seems to have abated the problem for now, thanks! Is there a command to figure out which DMA channels are reserved by the OS?

My system info:

Machine model: Raspberry Pi Zero W Rev 1.1
Linux version 4.9.35+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #1014 Fri Jun 30 14:34:49 BST 2017

Not that I know of.

This snippet of code might be a good start:
https://stackoverflow.com/questions/29628602/how-to-allocate-dma-channel-in-user-space

I had the very same problem of OP with DMA channel 5. I think that the documentation should be updated and the code should use DMA channel 10 by default.

I do not know which DMA channels are safe to use, so I'm not posting a pull request right now. However this forum post says:

Avoid channels 0, 1, 2, 3, 6, 7. The GPU uses 1, 3, 6, 7. The frame buffer uses 0 and the SD card uses 2.

Going to confirm that after running into the same issue with the latest version of raspbian, switching the DMA to 10 fixed my problem. All this after 15 Pi W's, 15 SD cards, multiple power supply swaps, and much head>desking. Thank you for the solution! I would also recommend that the DMA be switched to 10 in the docs. While 5 "should be acceptable" it clearly isn't. So a solution that works is better than a solution that should work in my opinion. Also, Thanks for the hard work in general. This library is totally awesome!

@totterfree I just submitted a pull request to change the defaults and add a note in the documentation. IMHO it's important for the default settings to be safe.

Do you know how safe dma 10 is ?

5 was safe ...
... on older firmware or kernel (not sure which determines the dma allocation)

We used dma=10 in production on two Raspberry Pi 3 Model B mounted on two drones, controlling both flight and lighting. There were no filesystem problems and the drones kept flying :)

Ok, I found some official documentation.

The Raspberry Pi 3 Model B, in the latest Raspbian, has a file called /proc/device-tree/soc/dma@7e007000/brcm,dma-channel-mask.

According to the Linux kernel documentation, this file contains the

Bit mask representing the channels
not used by the firmware in ascending order,
i.e. first channel corresponds to LSB.

For my Raspberry Pi 3 Model B, this file contains four bytes: 00 00 7f 34 (you can view it with the xxd utility).
This corresponds, in base 2, to the number 0000 0000 0000 0000 0111 1111 0011 0100.

I guess that the first 2 bytes (the first 16 zeroes) are not significant since the RPI's DMA has only 16 channels.
In my interpretation of the kernel docs, the reserved channels are: 0, 8, 9, 12, 14, 15, i.e., those corresponding to a zero in the bitmask.

Update (2018-01-01): sorry, I started counting from the most significant byte, opposite to the specification... the reserved channels should be: 0, 1, 3, 6, 7, 15. The strange thing is that 5 isn’t marked as reserved.

And this is where the Pi is a bit crap.
5 is free according to that and we know this is a bad choice these days...

At least it’s documented now in the readme!

oh man, this was causing a crazy amount of errors on my pi 3b. I was thinking my sdcard was dying, haha.

Is it a problem with all pi's on their newest firmwares now, or just the 3b/zero W? If it's only a subset, it might be worth it to just have no default set, if there's no guaranteed safe default.

I suspect it’s all models on newer firmware (or kernel - I don’t know which is responsible)

I'll try to downgrade my kernel/firmware later and see if I can bisect it to a specific version, if I have time. My suspicion is that it's baked into the kernel rather than done at the firmware level, but they're coupled anyways.

edit: my internet for the holidays is awful; guess I'll have to try again in a few weeks

I can confirm this happens to recently purchased raspberry pi zero w's on the latest lite image. We switched to DMA 10. No more corrupt SD cards so far. Every so often, we see a blip or two in animations after they've been running for awhile, but that could be something going on in our application. Kind of curious to try 0, 8, 9, 12, 14, or 15 to see if one of those smooths things out

Guys,
I spent all this weeks to face this issue too. switching PI Zero, SD, My Shield.
Since I build all from scratch I though I was up to date. And finally identified that the problem was coming from WS2812 driver. Then I decided to come here and got it, even if I compile new version my script was with old setup, and DMA 5

# Create NeoPixel object 2 LEDs, 64 Brighness GRB leds
strip = Adafruit_NeoPixel(2, gpio_led, 800000, 5, False, 64, 0, ws.WS2811_STRIP_GRB)

I think the code should fire a python warning/error if DMA used is 5, like this LED will not work but save your SD Card or OS, Because we can't guarantee that there is not sample code or code with DMA setup to 5 anywhere, by precaution, at least fire a big warning.

Anyway after 2 days, found my issue ;-)

The problem is that we have no way to query the OS for which DMA channels are free to use.
If we ever get to that, I think the correct behavior when the user tries to write to an unsafe DMA channel would be to abort the program immediately with a meaningful error message.

yeah, some versions use it, some don't; it's pretty much all undocumented, right? The defaults are just kinda shots in the dark so there's no real point in warning at all; it's kinda expected to read the readme and stuff.

phew, so glad I found this thread. was battling to fix this for hours!