raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pi5 i2c troubles using MCP23017

JinShil opened this issue · comments

commented

Describe the bug

This issue is basically a followup to the following discussions related to i2c on the Pi5:

#5853
#5916
#5802

.. and is likely the same issue affecting #5784. However, I have different hardware and thought it would be best to file a different issue describing my unique experience and reproduction steps in an attempt to provide Raspberry Pi engineers with a more immediate reproduction using common hardware.

In prior discussions I discovered that I had hardware running out of spec (a level translator spec'd for 400kHz running at 1MHz). I have updated hardware now running in spec at 1MHz, but I'm still encountering communication i2c communication problems.

For this issue report, however, I have made a concerted effort to remove my custom hardware and provide a reproduction using common hardware, specifically the MCP23017.

The symptoms are:

  • On the Pi 5 communication fails and results in many occurrences of "controller timed out" and "i2c_dw_handle_tx_abort: lost arbitration" messages in dmesg output. Sometimes the slave gets stuck holding SDA low requiring a power reset on the slave. Sometimes it is necessary to reboot the Pi 5 to get the i2c port to produce a signal again.
  • On the Pi 4, everything works fine

Steps to reproduce the behaviour

Connect an MCP23017 to 3.3V, GND, GPIO2 and GPIO3 on the Pi. Connect A0, A1, and A2 to GND. No other hardware or connections are required.

Configure the Pi 4 with dtparam=i2c_arm=on,i2c_arm_baudrate=1000000
Configure the Pi 5 with configured with dtoverlay=i2c1,pins_2_3,baudrate=1000000

Run sudo cpufreq-set -g performance to ensure the core clock frequency remains stable.

Verify the device can be found by running i2cdetect -y 1 and seeing the device appear with address 0x20. Occasionally this will fail on the Pi 5. Sometimes resulting in "controller timed out" and "i2c_dw_handle_tx_abort: lost arbitration" messages, sometimes not. If it succeeds, continue.

Compile the following code using gcc test.c -o test and run ./test

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/i2c-dev.h>

#define MCP23017_ADDR 0x20 // Address of MCP23017
#define IODIRA_REG    0x00 // I/O direction register A
#define IODIRB_REG    0x01 // I/O direction register B
#define GPIOA_REG     0x12 // GPIO port A
#define GPIOB_REG     0x13 // GPIO port B

unsigned long success = 0;
unsigned long fail = 0;

void print() {
    printf("success: %09lu\t\tfail: %09lu\r", success, fail);
    fflush(stdout);
}

int main() {
    int fd, i;

    // Open the I2C interface to the MCP23017
    if ((fd = open("/dev/i2c-1", O_RDWR)) < 0) {
        perror("Failed to open I2C interface");
        return 1;
    }

    // Set the I2C slave address
    if (ioctl(fd, I2C_SLAVE, MCP23017_ADDR) < 0) {
        perror("Failed to set I2C slave address");
        close(fd);
        return 1;
    }

    // Set all pins as outputs
    while (write(fd, (char[]){IODIRA_REG, 0x00}, 2) != 2) {
        fail++;
        print();
    }

    while (write(fd, (char[]){IODIRB_REG, 0x00}, 2) != 2) {
        fail++;
        print();
    }

    
    while (1) {
        // Toggle all outputs
        if (write(fd, (char[]){GPIOA_REG, 0xFF}, 2) != 2) {
            //perror("Failed to write to GPIO port B");
            fail++;
            print();
        }
        else {
            success++;
            print();
        }

        if (write(fd, (char[]){GPIOB_REG, 0xFF}, 2) != 2) {
            //perror("Failed to write to GPIO port B");
            fail++;
            print();
        }
        else {
            success++;
            print();
        }

        if (write(fd, (char[]){GPIOA_REG, 0x00}, 2) != 2) {
            //perror("Failed to write to GPIO port A");
            fail++;
            print();
        }
        else {
            success++;
            print();
        }

        if (write(fd, (char[]){GPIOB_REG, 0x00}, 2) != 2) {
            //perror("Failed to write to GPIO port B");
            fail++;
            print();
        }
        else {
            success++;
            print();
        }
    }

    close(fd);
    return 0;
}

On the Pi 4, it should repeatedly print a large numbers of successes.

On the Pi 5, communication fails and dmesg output shows "controller timed out" and "i2c_dw_handle_tx_abort: lost arbitration" messages. Occasionally communication will succeed briefly, but then fail thereafter printing a large number of failures.

Temperature of the Pi 5 was confirmed with vcgencmd measure_temp to be between 40~60 degrees C.

Powering the MCP23017 with 5V instead of 3.3V, leaving SDA and SCL at 3.3V, usually results in fewer communication failures, and often no messages in dmesg when failures occur, However, communicate failures are still inevitable. Under this failure mode, it is interesting to note that the slave is not stuck holding SDA low. However, it is still very difficult to recover without power cycling everything.

Failure modes on the Pi 5 are very inconsistent making it quite difficult to convey anything other than a lot "I tried X and it sometimes resulted in A, sometimes B, but also sometimes C". If I am able to reproduce failure modes consistently, I'll report those results.

Device (s)

Raspberry Pi 5

System

$ cat /etc/rpi-issue
Raspberry Pi reference 2023-12-05
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4
$ vcgencmd version
Feb 29 2024 12:24:53
Copyright (c) 2012 Broadcom
version f4e2138c2adc8f3a92a3a65939e458f11d7298ba (clean) (release) (start)
$ uname -a
Linux raspberrypi 6.6.20-v8+ #1738 SMP PREEMPT Tue Mar  5 14:51:28 GMT 2024 aarch64 GNU/Linux

Logs

Often communication failures result the following in dmesg, but not always:

[   43.925214] i2c_designware 1f00074000.i2c: i2c_dw_handle_tx_abort: lost arbitration
[   44.930880] i2c_designware 1f00074000.i2c: controller timed out
[   45.945498] i2c_designware 1f00074000.i2c: controller timed out
[   46.960680] i2c_designware 1f00074000.i2c: controller timed out

Sometimes only the "controller timed out" message appears.

Additional context

No response

commented

This is waveform of a successful communication on the Pi 4:
image

This a a waveform of a rare successful communication on the Pi 5:
image

commented

Another observation: Adding usleep(some_delay) function call between each communication can improve the number of successful transactions, but that is not ideal for my use case.

commented

bah! In my attempt to reduce the number of variables I have found I am running the MCP23017 out of spec. At 1MHz, it must be powered by at least 4.5V, which means I will need to add a level translator to create a reproduction of the issues I'm seeing. I will repopen if/when I'm able to do so. Sorry for the noise.

commented

Actually, I may be reading the MCP23017 datasheet wrong. I'm not sure of the MCP23017 must be powered by at least 4.5V when operating at 1MHz. Reopening for feedback.

Edit: Furthermore, on the Pi 4 it works fine. So....

commented

You are running the device out of spec, or at least marginally. VDD is specified in table 1-3 for various operating conditions - in 1.7MHz mode you need a VDD (and bus high-state) voltage between 4.5-5.5V. It follows that the requirement is lower for lower clock rates, but I think you'll find 1MHz requires more than 3.3V.

JinShil - I'm open to the possibility of there being some remaining timing problem, but without a clear example involving a common, trusted device operating within spec it's hard to give it much credence. There are a number of threads suggesting that there is some problem where devices are permanently holding SDA low, but so far we have no clue about the cause of that one.

commented

@pelwell: I understand. I thought I finally had a proper reproduction here given that things worked fine on the Pi 4, but not on the Pi 5. I will close this for now and keep working on some way to get a reproduction with common hardware. However, if you haven't already, please add some long-term (e.g. 30 minutes or more) tests for 1MHz i2c hardware, with repeated, no-delay queries. That's basically what I'm trying to achieve without success on the Pi 5.

please add some long-term (e.g. 30 minutes or more) tests for 1MHz i2c hardware, with repeated, no-delay queries.

I'll have a look around and see what we have.

I've located an ADS1115, which should be good up to 3.4MHz (i.e. more than Pi 5 can handle), so I'll give it a thrashing.

commented

I just tried an ADS1115 also. I didn't perform a thorough test yet, but so far everything looks fine.

This appears to work: Pi5@1MHz <--> TCA9406 <--> ADS1115
This appears to work: Pi5@1MHz <--> TCA9406 <--> PCA9675
This does not: Pi5@1MHz <--> TCA9406 <--> MCP23017

I suspect, unless you have a TCA9406 and MCP23017, you won't have any trouble, but I'll still be interested in your results.

And how is the MCP23017 without the TCA9406 voltage converter? Or is that the first set of results?

commented

And how is the MCP23017 without the TCA9406 voltage converter? Or is that the first set of results?

That's the first set of results. I was trying to keep the TCA9406 out of the equation, but in doing so, it appears I was running the MCP23017 out of spec.

Doing a little more reading of the MCP23017 datasheet it says it supports 100kHz, 400kHz, and 1.7MHz. I'm wondering now if I should be restricting myself to just those frequencies, and not anything in between.

Doing a little more reading of the MCP23017 datasheet it says it supports 100kHz, 400kHz, and 1.7MHz.

That's odd. But the Pi 5/RP1 blocks won't do more than 1MHz. If you try, you'll get:

[ 7292.588020] i2c_designware 1f00074000.i2c: High Speed not supported!

By the way, your test code gives less confusing results if you pass more parameters to printf:

    printf("success: %09lu\t\tfail: %09lu\r", success, fail);
commented

By the way, your test code gives less confusing results if you pass more parameters to printf:

Sorry about that. I had fixed that in my code when I ran my tests, but neglected to update the issue, which was copied from an older file.

I can reproduce with an MCP23017 at 1MHz. The logic analyser agrees - it's failing to decode it as I2C - but not saying why. It's probably something to do with setup time - SCL does seem to be changing hot on the heels of SDA.

I think the difference may be in the time between the falling edge of SCL and where SDA goes low for the ACK. At 1MHz, the point where SCL crosses the 2V line and where SDA falls through the 1V line are approximately the same.

I the wires on the MCP23017 I have are very long (15cm - I've ordered an MCP23017 HAT), but by changing the on-board pull-ups to pull-downs (there are still external 1.8K pull-ups) it will run at up to 900MHz. Using one of the other I2C interfaces instead (I picked I2C3 on 6 & 7) it runs at 1MHz, so I think what we're seeing is a failure of the device to ACK in time. Reducing the SCL-high time may improve the timing - watch this space.

commented

Since operating the MCP23017 at greater than 400kHz requires a supply voltage of greater than 4.5V, I've gone back to the following configuration for testing: Pi <--> TCA9406 Voltage Translator <--> MCP23017. Interestingly, this produce the following results for me:

Pi5 <--> TCA9406 <--> ADS1115 @ 1MHz: Works great
Pi5 <--> TCA9406 <--> PCA9675 @ 1MHz: Works great

Pi4 <--> TCA9406 <--> MCP23017 @ 1MHz : Works great
Pi4 <--> TCA9406 <--> MCP23017 @ 400kHz : Works great
Pi4 <--> TCA9406 <--> MCP23017 @ 100kHz : Works great

Pi5 <--> TCA9406<--> MCP23017 @ 1MHz : Works for a short time, then "controller timed out" in dmesg and SDA is stuck low.
Pi5 <--> TCA9406<--> MCP23017 @ 400kHz : Works for a short time, then "controller timed out" in dmesg and SDA is stuck low.
Pi5 <--> TCA9406<--> MCP23017 @ 100kHz : Works for a short time, then "controller timed out" in dmesg and SDA is stuck low.

Since the Pi5 works fine with the ADS1115 and the PCA9675, it seems the MCP23017 is to blame. However, since the MCP23017 works fine on the Pi4, it seems the Pi5 is to blame. This is the dilemma I'm facing.

Tomorrow, I'll try slowing down the rise time of SCL by reducing playing with the pullup resistors and see if can demonstrate improvement.

I've got it working at 1MHz with no pull-up changes - see #6050.
The change is to shrink tHIGH to the minimum, giving the maximum time for the device to respond.

commented

#6050 appears to have resolved this issue. I ran sudo rpi-update pulls/6050, configured for 1MHz, and ran the code in the this issue's original post for more than an hour, and it didn't have a single failure. I also ran similar tests for the ADS1115 and PCA9675 at 1MHZ, and they also worked great.

All tests utilized the TCA9406 between the Pi5 and the i2c device.

I tried running the tests on multiple channels simultaneously (i2c-1 - ADS1115, i2c-2 - PCA9675, and i2c-3 - MCP23017) and that caused failures. I don't remember that being the case on the CM4, but I think that is a separate issue, so I will collect more data on it, and if I can rule out my hardware, I will open a new issue separately.

#6050 appears to resolve the issue in this thread, so I'll close it. Thank you for your help and perseverance.