stickbreaker / arduino-esp32

Arduino core for the ESP32

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TimeOut recovery

stickbreaker opened this issue · comments

TimeOut, Arbitration, and Bus Busy errors cause by hardware faults are difficult to recover from.

The current Wire.reset() cannot successfully reset the hardware.

This can be tested by initing the Wire library with Wire.begin() then grounding SDA. This is simulate a START -> STOP, it is considered a void statement by the I2C standard and not recommended. But, reality cannot be ignored, it happens.

The bus will be clear(SCL and SDA high with no activity), but the SM will fall into an irrecoverable state of BUS_BUSY, TIMEOUT perhaps ARBITRATION.

The only successful recovery is a HARDWARE reset operation. Or, if a second set of GPIO pins are dedicated, a manual bit banged START 9 clocks STOP can recover the SM.

Does anyone have a successful Software Recovery Method?

Chuck.

@stickbreaker You may be waiting awhile on this one. I got the reset that I provided on line 498 of the code (https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/esp32-hal-i2c.c) directly from the Espressif folks as this is the only known way with the current chip to recover the I2C.

@lonerzzz , Does this reset work for you? When I try it with my above error case, it does not clear the error. I have to manually touch reset(en)

The only way I have been able to recover is by wiring two more pins through diodes and bit banging a START 9 clocks STOP. This always works, but It uses 2 pins and some Diodes.

Chuck.

The reset does work on the ESP32 side but that does not work for all the situations because there is a class of errors that require more than a state machine reset when the slaves hold lines down. One thing that could be done would be to have the reset disable the HW I2C and its connection to the the SDA and SCL pins, bit bang the clock, reattach the SDA and SCL to the pins and restore the HW I2C. I too had the extra connections to the SDA and SCL pins but that is not needed since the pins can be general GPIO.

@lonerzzz yea, I think the Wire.reset() function should have that bit bang ability. The way the pins on the ESP32 are designed, digitalRead() can still function while the pins are attached to the I2C hardware. It might make sense to add a conditional bitbang routine if SCL or SDA is held low while resetting. I'll add it to the ToDo list.

I'm thinking about a revision that would reduce the entry points to esp-hal-i2c.h I would like to get it down to about 4, initHardware() releaseHardware() writeTransaction() readTransaction(). The write() and read() would kickoff the queue when they received a STOP. Can you see value for init() and release() to create a 'handle' that is passed to read() and write() allowing independent per task queues? How much multi task support should the HAL level routines have? Should it be possible to have multiple processes feeding these queues?
I use I2C port expanders to drive 4x4 keypads and 20x4 lcd's. The keypads have interrupt capabilities to reduce the polling scans. But, accessing the I2C hardware from an interrupt is chancy. I can envision handing off a polling cycle through a EventGroup or Semaphore to an independent task. This model is conceptually simpler than integrating keyboard polling throughout the main app, Just process the buffered keystrokes when it fits in the flow. The background key processing keeps the user satisfied instead of them pounding on a 'dead' keypad.

Chuck.

Sorry, just not getting updates, not even in my spam folder. Here is my thoughts.

The use of a handle at the HAL layer would be useful to allow concurrent I2C channels especially since the ESP32 supports multiple I2C channels. I wouldn't go so far as to have multiple tasks feeding the queues though. That could get complicated for debugging and support. To me having the independent handles provides the best balance. The application author can set up tasks each owning one of the I2C handles fairly easily.

@lonerzzz Thanks, I'll think through that.

Another Timeout Testing question:

I am trying to understand the StateMachine, and how it responds to the timeout. With the results you have see, there is two possible exit paths from the timeout:
* First is that the SM skipped the command[] that encountered the timeout and processed the next comand[] STOP
* Second is that after SCL recovered it issued a STOP

I wonder if you could change your requestFrom() to this:

Wire.requestFrom(ID,1,false);
Wire.requestFrom(ID,1);

This would create a command[] list of:
START
WRITE 1 (readmode)
READ 1 NAK
START
WRITE 1 (readmode)
READ 1 NAK
STOP

It will create a different Interrupt Pattern if the Timeout Recover continues through the Command[] list.
If it just inserts a STOP and aborts, That is important to know.

Currently the Timeout Recovery never pass I2C_ERROR_TIMEOUT back to the APP.

Chuck.

@stickbreaker Here is what the specified sequence of requestFrom calls dumps out. The sequence repeats.

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x689ac, ed=0x6917c, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=1
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=0
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=0
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1e W STOP buf@=0x3ffe3eb4, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: p 70
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x000689ac
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0098 0x0100 0x0000 0x0000 0x00069174

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x693d1, ed=0x69ba1, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=1
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=0
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=0
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1a W STOP buf@=0x3ffe3eb4, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: p 70
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x000693d1
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0098 0x0100 0x0000 0x0000 0x00069b97

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6a07f, ed=0x6a84f, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=2
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=1
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=1
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1b R buf@=0x3ffe4784, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: P 50
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [1] 1b R STOP buf@=0x3ffe4785, len=1, pos=0, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: . 00
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x0006a07f
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0001 0x0200 0x0000 0x0000 0x0006a07f
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [03] 0x0003 0x0040 0x0000 0x0001 0x0006a07f
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [04] 0x0001 0x0002 0x0000 0x0000 0x0006a07f
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [05] 0x0098 0x0100 0x0000 0x0000 0x0006a847

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6a8d8, ed=0x6b0a8, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=1
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=0
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=0
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1c W STOP buf@=0x3ffe3eb4, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: p 70
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x0006a8d8
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0098 0x0100 0x0000 0x0000 0x0006b0a0

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6b109, ed=0x6b8d9, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=1
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=0
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=0
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1e W STOP buf@=0x3ffe3eb4, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: p 70
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x0006b109
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0098 0x0100 0x0000 0x0000 0x0006b8d1

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6bb2f, ed=0x6c2ff, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=1
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=0
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=0
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1a W STOP buf@=0x3ffe3eb4, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: p 70
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x0006bb2f
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0098 0x0100 0x0000 0x0000 0x0006c2f4

[E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6c7dc, ed=0x6cfac, =2000, max=2000 error=1
[E][esp32-hal-i2c.c:608] i2cDumpI2c(): i2c=0x3ffc10c0
[E][esp32-hal-i2c.c:609] i2cDumpI2c(): dev=0x60013000 date=0x16042000
[E][esp32-hal-i2c.c:610] i2cDumpI2c(): lock=0x3ffd6e1c
[E][esp32-hal-i2c.c:611] i2cDumpI2c(): num=0
[E][esp32-hal-i2c.c:612] i2cDumpI2c(): mode=1
[E][esp32-hal-i2c.c:613] i2cDumpI2c(): stage=3
[E][esp32-hal-i2c.c:614] i2cDumpI2c(): error=1
[E][esp32-hal-i2c.c:615] i2cDumpI2c(): event=0x3ffe2e40 bits=0
[E][esp32-hal-i2c.c:616] i2cDumpI2c(): intr_handle=0x3ffe2ee8
[E][esp32-hal-i2c.c:617] i2cDumpI2c(): dq=0x3ffe2e70
[E][esp32-hal-i2c.c:618] i2cDumpI2c(): queueCount=2
[E][esp32-hal-i2c.c:619] i2cDumpI2c(): queuePos=1
[E][esp32-hal-i2c.c:620] i2cDumpI2c(): byteCnt=1
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [0] 1b R buf@=0x3ffe4784, len=1, pos=1, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: P 50
[E][esp32-hal-i2c.c:581] i2cDumpDqData(): [1] 1b R STOP buf@=0x3ffe4785, len=1, pos=0, eventH=0x0 bits=0
[E][esp32-hal-i2c.c:597] i2cDumpDqData(): 0x0000: . 00
[E][esp32-hal-i2c.c:940] i2cDumpInts(): row count INTR TX RX
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [01] 0x0001 0x0002 0x0002 0x0000 0x0006c7dc
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [02] 0x0001 0x0200 0x0000 0x0000 0x0006c7dc
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [03] 0x0003 0x0040 0x0000 0x0001 0x0006c7dc
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [04] 0x0001 0x0002 0x0000 0x0000 0x0006c7dc
[E][esp32-hal-i2c.c:943] i2cDumpInts(): [05] 0x0098 0x0100 0x0000 0x0000 0x0006cfa4

@lonerzzz Thanks for the data.
Well, Based on these dumps, That Sensor holds SCL for a LONG, LONG time.
The I2C bus was busy from 0x6a07f - 0x6c7dc over ten seconds (10.077)
Sequence:
Read(1) from 0x0D returned 'P', this was the requestFrom(false);
Read(1) from 0x0D failed
Write(1) to 0x0E failed
Write(1) to 0x0F failed
Write(1) to 0x0D failed
Read(1) from 0x0D succeeded 'P'
Then it starts over.
procQueue() ignores Bus_Busy, so those 4 I2C calls accomplished nothing except a 10second delay.

It looks like your sensor really hangs the I2C bus for a LONG time. I don't think any recovery attempt would be correct. The delay is unavoidable. Would you want to add some other mech to monitor this pause?
digitalRead(SCL) works while the I2C SM is attached. digitalWrite(SCL) does not. So, After you initiate that READ you might as well just hang in a:

count=Wire.requestFrom(0x0d,1);
if(Wire.lastError()==I2C_ERROR_TIMEOUT){
   while(!digitalRead(SCL));  //hang until the slave releases SCL
   count=Wire.requestFrom(0x0d,1); // re-issue the READ that Timed-Out
   }
while(Wire.available()){
 // do something with the data
}

If you increase the timeout to this 10second range, it will probably act different. But every time it times out it shuts off the I2C SM with ctr.trans_start=0; Increasing the timeout with Wire.setTimeOut(10000);
might allow this code to function correctly:

Wire.setTimeOut(10000);
count=Wire.requestFrom(0x0d,1); // the 10second wait 
if(count==1) {
  // do something with the data
}
else {
  // something bad happened
}

These dumps don't show what I was hoping. There was no Timeout recovery. The SM just paused. The Bus was nolonger busy when the successful READ occurred at 0x6c7dc. Though, this does show the SM did recover from the SLAVE stretching SCL, but Transactions was lost/aborted while the SLAVE controlled SCL.

Should I change the code to return I2C_ERROR_BUSY if status_reg.bus_busy is asserted when the timeout expires? My sample code would change to this:

Wire.setTimeOut(2000);
count=Wire.requestFrom(0x0d,1);
if(Wire.lastError()==I2C_ERROR_BUSY){
   while(!digitalRead(SCL));  //hang until the slave releases SCL
   count=Wire.requestFrom(0x0d,1); // re-issue the READ that Timed-Out
   }
while(Wire.available()){
 // do something with the data
}

Chuck.

@stickbreaker Sorry if there is confusion here. The sensor isn't holding for the duration that you see in the messages. It is getting messed up by the test sequence that you wanted to see so this includes reset and recovery times as well after getting messed up. Under normal operation it responds in under 2 seconds and recovers in less than 4 seconds. As well, the sensor has a polling mechanism that is the other method of operation which is what I use normally to avoid monopolizing the bus as there are multiple sensors on the same bus.

As for the response codes, returning busy in preference to the timeout does make sense to me because it can indicate something more relevant than the timeout if the SM is in an unexpected state.

@lonerzzz Ok,
How about this:

  • If status_reg.bus_busy is set on entry and a Gross timeout is the termination condition then return I2C_ERROR_BUSY.
  • if Data actually moves through the SM and a Gross TimeOut is the termination Condition then return I2C_ERROR_TIMEOUT

Chuck.

@stickbreaker That is what I was thinking so that we don't mask a state machine error.

@lonerzzz ok, I'll do that. the test condition will be the movement of data through the statemachine. If nothing moved it is Bus_Busy, else just a TimeOut.

Chuck.

Late to the party here, but I just started running into I2C problems when writing to a 128x32 OLED display. In a simple execution environment all is well, but once I have several FreeRTOS tasks running I occasionally run into problems. It starts with an i2cWrite(): Ack Error! followed by an endless stream of i2cWrite(): Busy Timeouts. So far the only way I have found to recover is with a hardware reset.

My reading of the discussion above is that most of the issues being worked on are focused on reading I2C data, but I thought I'd add a comment and see what thoughts you two may have on this issue. I know nothing about the inner workings of the I2C protocol and I'll likely first try switching to an SPI-based display before digging into the details of I2C. Plus at the moment I've got no way to reliably reproduce the problem except by waiting for something to eventually go wrong.

@tferrin Are you using the main arduino-esp32 or my version? Base on you description i2cWrite() you are using the main branch. The main branch uses a polling mode communication with the I2C hardware. It is really timing sensitive, the WiFi subsystem will cause these polling I2C operations to fail. Also, this Main Branches handling of ReStart operations is faulty. It will cause TIMEOUT conditions that are non recoverable.

Try using my Release V0.1.2
The zip file contains the files that need to be replaced to use my interrupt driven replacement.

Master Mode operations (reading/writing) are similar, the differences do not contribute anything to the TimeOut problems. Most I2C operations compose both Write and Read operations.

Chuck.

I'll be trying out your V0.1.2 release this evening. I had already decided to do so after my earlier post, but had other obligations that needed attention. I do have WiFI running and also a high-priority task that reads data from a network of sensors via a low-power radio link. So there is ample opportunity for task preemption. Will post up my results.

Also, just read this...
"If you are using multiple tasks, you should pin the one using @stickbreaker 's code to CORE1 otherwise you will have collisions with the WiFi interrupt. Just figured this out."
(from espressif#811 (comment))

So I need to review how I've done my task allocation.

Good news to report using the V0.1.2 code. The Busy Timeouts are gone, replaced by TimeoutRecovery's. Now the display never stops responding and output is never garbled. (See attached log output.)

--tom

i2cWrite-8feb18.txt

I've edited my previous comment to remove the reference to exceptions happening with the V0.1.2 code. I caused those by making a stupid programming error.

Things are even better today! Turns out all the Timeout Recovery's noted in my post above were being caused by the interaction of my i2c display and the OneWire library, no doubt because of the portENTER/EXIT_CRITICAL calls it uses. By using a mutex semaphore around writes to the display and reading from the OneWire temperature probes I eliminated all the remaining i2c problems I was having.

Thanks, @stickbreaker, for the great improvements you made to OneWire and esp32-hal-i2c.c!

@me-no-dev, these changes really need to get incorporated into the main repository. I'm pretty convinced that the minor incompatibility caused with the new API will hardly be noticed. In all the device libraries I keep on my system grep not find any that use writeTransaction(...,false) or requestFrom(...,false). Not saying they don't exist, but I suspect any incompatibility downside will be far outweighed by the improved stability gains.

@tferrin I'm glad it's working for you. The timeoutRecovery messages need to be removed. See my comments on issue #21.

Chuck.

Adding a reference to a related thread on the ESP-IDF site: espressif/esp-idf#1503

I think I have a fix for this! Busy-Glitch Merged into main Branch 14MAR2018

Chuck.

stickbreaker, thanks for your work on this I2C issue! I'm not sure if this is the best place to post but the main "trunk" of this repository fixed the I2C issues I was experiencing. I have a custom board (based on Adafruit ESP32 feather) with around 12 I2C devices communicating with various extenders and I2C multiplexers and your code really seemed to fix the stalls in the I2C communication I was experiencing. Good Work!! Hopefully your modifications can get incorporated into the main espressif ESP32 repo.

Thanks again!

@thaanstad Do you have a list of i2c devices you are using?

@thaanstad I'm pleased it is working for you.

Do you use the original Wire.wire() and Wire.read() or are you using my blockread/write functions?

I haven't receive any feedback from people using these block transmission function. I'm curious if anyone uses them, and if they think they should be included in the main branch when this is merged.

I added them because I wanted to test the hardware capacities. I use 24LC512 as a simple FAT file system. The Arduino 32byte buffers were a pain to work around.

Chuck.

I think I have a fix for this! Busy-Glitch [link]

When I click on that link github returns 404 Page Not Found

@tferrin I merged it into my main branch, it is no longer a separate branch.
It is also release V0.2.0. V0.1.2 is the initial from January, V0.1.3 includes my first attempt at hardware Reset.

Chuck.

The main code in the repo: https://github.com/espressif/arduino-esp32 as of the morning of 3/15/2018 was having problems with I2C communication with I2C chips on my custom boards (based on Adafruit ESP32 feather). The I2C communications would run fine for 30 secs - 2 min or so and then stop for no reason. The main code would continue to run and I would just see garbage data from I2C sensors or the I2C controlled IO control would stop. I took a look at the clock voltage during one of these hangs and did not see any changes from 3.3V which I assume indicates the clock stopped. I think there was some error in the I2C communication and the ESP32 would stop communication and would not resume.

After updating to stickbreaker's repo the communications have been working without interruption in excess of 1hr (and still going). I have seen 3 I2C miscommunications that showed false data from some of the sensors but this did not stop the I2C communications and the USB Serial data printout continued as normal with good data after the one misread.

When there were hangups before the MCU would continuously spit out garbage data (shown via USB Serial) and I assume was not re-engaging I2C comms after a hangup or miscommunication. I don't remember if the data was the same or not... (all 0x00 or 0xFF).

FYI there were no hardware changes during the testing of these repos and I haven't touched the circuitry (or connection wires) during the I2C communications evaluation. After reading on the essprissif/ESP32 repo issues about the ESP32 MCU hanging during I2C communications I assumed this was the problem with my circuit and have only done modifications in software to try to resolve the problem. I am sure the improvement in I2C communications are due to the modifications in stickbreaker's repo and not a hardware change.

chinswain:
I am using the following ICs that utilize I2C for communication:
TCA9548 (I2C multiplexer)
9x MCP23017 (IO expansion)
P82B715D (I2C range extenders)
BME280 (RH, Temp, Pressure)
PCF8523 (RTC)
ADS1115 (ADC)
4x HAFBLF0750C4AX5 (flow meters)

stickbreaker:
I am only using the Wire.Write(), Wire.requestFrom() and Wire.read() methods. I have not used block reads for this project. I was using the trunk of Repo revision 584. Is there another version you want me to test out? Perhaps the one you just merged? I am not sure if that is the one I am using or not.

@thaanstad My current main branch is the most up to date version. No one has 'yet' started yelling at me that is breaks their stuff. I recommend you use it (main branch is currently at release V0.2.0).

Have you been bitten by my ReStart differences? Wire.endTransmission(false) will return err==7.

I also use MCP23017's to interface to HD44780 LCDs and HD66717 LCDs. I have found using Non-Sequential mode allows fast updates. I have my LCD's wired with 4bits of PortA for the CTRL signals and all 8bits of PortB as data. I then can sends a continuous stream of data to fill the 4x20 one line at a time. Each of the 20 display characters takes 4 bytes of I2C data. [data,ctr(write,EN),data,ctrl(write,!EN)].
So to write a complete line(20 characters) takes two I2C sequences: one to set the cursor position, the second to send 80 bytes that set all characters on the selected line.

I have been able to update my 4x20 lcd at over 2,000 cps. It just gray's out the screen.

I like those MCP23008, MCP23017's.

Chuck.

commented

This is great Chuck. On the latest sync to ardunio core and using an earlier version of your 4 files I had started getting the timing issues occurring again. Today I found the discussion between you and ESP32DE that fixed that by setting the SDA and SCl pins high before setting the pin mode. So I put in your change to my environment and it failed with the same timing issues. I thought I would change the pin mode to output_open_drain and all the issues went away. I can repro this easily in my environment. I then pulled down your latest changes from your master branch and it all works now fine even though you are using open_drain for the pin mode. So I am not sure why , but I though I would let you know. For reference I am attaching the 4 earlier files I have that work with the output_open_drain but not with open_drain. Pretty weird so say the least! But you put so much effort into this I thought I would let you know.

Wire.h.zip

@ifrew Thanks for the accolades. The OPEN_DRAIN is required because the i2c bus is electrically an open_drain bus. When the peripheral is attached, it re-configures the output drivers on the GPIO's behind the scenes.
I'll probably have a new version up this week that will support MultiMaster operations. Hopefully! subject to gremlins and power failures!

Chuck.

I'm going to close this issue, I believe the timeout cascade has been solved.