adafruit / Adafruit_ILI9341

Library for Adafruit ILI9341 displays

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

default HWSpi on teensy 3.6 is slightly slower than swspi, and 2.1x slower than ESP32

marcmerlin opened this issue · comments

  • Arduino board: teensy 3.6
  • Arduino IDE version arduino-1.8.9-teensyduino-1.4

I'm not sure how much of this is WAI, feel free to close if so. Suggestions though:

  1. default HWSPI speed on teensy 3.6 is too slow (slower than SWSPI even). tft.begin(40000000) makes things about 2x faster than SWSPI
  2. even then, it's still 2x slower than the optimized T3 teensy library
  3. more interestingly graphicstest on ESP32 with HWSPI using your exact same git TOT lib, is 2x faster than HWSPI on teensy, and your lib with ESP32 is still a bit faster than the optimized teensy library on teensy 3.6
Using library SPI at version 1.0 in folder: /var/local/arduino-1.8.9-teensyduino-1.46/hardware/teensy/avr/libraries/SPI 
Using library Adafruit-GFX-Library at version 1.5.1 in folder: /home/merlin/Arduino/libraries/Adafruit-GFX-Library 
Using library Adafruit_ILI9341 at version 1.5.0 in folder: /home/merlin/Arduino/libraries/Adafruit_ILI9341 

I'm using both your libraries from git TOT

#define TFT_MISO 12
#define TFT_CLK 13
#define TFT_MOSI 11
#define TFT_DC 10
#define TFT_RST 23
#define TFT_CS 22
Adafruit_ILI9341 tft = Adafruit_ILI9341(TFT_CS, TFT_DC, TFT_MOSI, TFT_CLK, TFT_RST, TFT_MISO);
//Adafruit_ILI9341 tft = Adafruit_ILI9341(TFT_CS, TFT_DC, TFT_RST);

When using swSPI, screen fill is 890029, almost 1 sec.
When using hwSPI, screen fill is 954115, even slower.

When using hwSPI, using tft.begin(40000000) takes the speed down to 416309 (0.4s). Higher values do not go faster.

For comparison, ILI9341_t3 tft = ILI9341_t3(TFT_CS, TFT_DC) does a screenfill in 224904

Last but not least, the same Adafruit_ILI9341 with HWSPI on ESP32 does screenfill in 195277, faster than teensy's optimized t3 library.

For ESP32, referencing back to #19 , https://cloud.githubusercontent.com/assets/12663778/23399153/ed26ac16-fda7-11e6-9f0a-447f2c4307c9.png does show a maximum speed of 192198 for screen fill on ESP32/HWSPI after me-no-dev's refactor.
That is very close to 195277 which I got today.
Assuming that teensy 3.6's hwSPI is now inferior, it should also be able to do a screen fill on 0.2s hopefully.
And if you wonder why I care https://github.com/marcmerlin/FastLED_SPITFT_GFX stores a FastLED pixel array and allows rendering on TFTs, including ILI9341 now, but matrix->show() relies on a full framebuffer sync (i.e. fillscreen equivalent), so getting at least 5fps or better, would be good :) (on SSD1331, I get more than enough speed due to the smaller number of pixels)

And before you ask me why I'm not using ESP32 instead of teensy 3.6 given that ESP32 can do screenfill faster, @PaulStoffregen will be happy to know that ESP32 is unable to do the job as it cannot allocate the 230KB of contiguous memory for the 320x240 24bpp array (ESP32 memory is fragmented in weird ways). No such problem on teensy 3.6.

we haven't done much testing and tuning w/Teensy 3.6 using this library & @PaulStoffregen has put most of the effort into the 't3' library. If you have updates and improvements, please submit them as PRs. @me-no-dev submitted the PRs that got ESP32 support as good as it is :)

Agreed and understood. Let's see what Paul says, whether it's a low hanging fruit that he can find/is interested in looking for.
And yes, I remember when @me-no-dev was working on the ESP32 version, I was testing it daily back when he was writing it, and you also merged a couple of improvements from me back in those days. I mostly mentioned it to confirm that your underlying code is capable of going slightly faster than the T3 library on teensy, so hopefully whatever makes it slower on teensy can be improved.
That being said, a low hanging fruit seems to change this:
#54
Without this, hwSPI is actually slower than swSPI.
After the patch, we're still off by a factor of 2, but it's a start.

thank u! if you test on both a 3.2 and 3.6, thats the coverage we'd want before merging - there's no travis CI coverage.

the obvious patch was merged in, but given that even with it, it's 2x slower than it can be, you can decide whether it's worth keeping open for @PaulStoffregen or someone else to see if it can be further improved.

For reference

  • When using hwSPI on teensy 3.6, using tft.begin(40000000) takes the speed down to 416309 (0.4s). Higher values do not go faster.
  • For comparison on the same teensy, ILI9341_t3 tft = ILI9341_t3(TFT_CS, TFT_DC) does a screenfill in 224904
  • the same Adafruit_ILI9341 with HWSPI on ESP32 does screenfill in 195277, faster than teensy's optimized t3 library and 2.1x faster than the same library code running on teensy with tft.begin(40000000)

2x is not so bad, given the t3 library uses DMA/FIFO. when paul has time, we have a plan to try and make a generalized dma/fifo library.