STMicroelectronics / x-cube-azrtos-h7

X-CUBE-AZRTOS-H7 (Azure RTOS Software Expansion for STM32Cube) provides a full integration of Microsoft Azure RTOS in the STM32Cube environment for the STM32H7 series of microcontrollers.

Home Page:https://www.st.com/en/embedded-software/x-cube-azrtos-h7.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stack Corruption: tx_queue_receive() with TX_WAIT_FOREVER Option

ajitbasarur opened this issue · comments

Describe the set-up

  • x-cube-azrtos-h7/Projects/STM32H735G-DK/Applications/USBX/Ux_Host_MSC/
  • USB Mass storage with 2GB capacity is tested.

Problem -- Stack gets corrupted if msc_process_thread_entry() is exited

  1. In file ux_host_msc.c, we are waiting for the message by calling the following function "tx_queue_receive(&ux_app_MsgQueue_msc, &fx_media, TX_WAIT_FOREVER);". Then, the code waits there forever. Note that fx_media is the destination address, where the data from queue has to be copied.
  2. Once USB is inserted, the execution control goes to the function "tx_queue_send(&ux_app_MsgQueue_msc, media, TX_NO_WAIT); in the 'app_usbx_host.c' file".
  3. Single stepping leads us to the function "_tx_queue_send()" in the tx_queue_send.c file. And inside this function, the execution comes to the following MACRO "TX_QUEUE_MESSAGE_COPY(source, destination, size)". This is where we believe the problem occurs.
  4. In MACRO "TX_QUEUE_MESSAGE_COPY()", the destination pointer is fx_media. It seems that TX_QUEUE_MESSAGE_COPY() is copying an extra word and that is leading to stack corruption.
  5. If you look closely in file 'ux_host_msc.c' and with in the thread "msc_process_thread_entry()", fx_media is allocated in the stack next to the place, where return address of the thread "msc_process_thread_entry" is stored.
  6. Once the TX_QUEUE_MESSAGE_COPY() writes an extra word, it corrupts the stack and therefore, the return address of thread "msc_process_thread_entry()" is lost forever.
  7. If we want to exit the thread "msc_process_thread_entry()", then some random address is returned from the stack. This random address is popped back into the memory and it results in an invalid instruction execution. An invalid instruction execution is considered as a Hard Fault in STM32H7 chip.

This Problem is present in most of the STM32H7 example projects

  1. It is to do with the usage of tx_queue_creat() function, specifically the size of message.
  2. The message size should be mentioned in multiple of WORDS. If the message size is 4, then the queue has to transfer 4 WORDS.
  3. In the example project, a pointer is passed as a message queue. The size of pointer is 4 bytes but it is 1 WORD in terms of message size.
  4. STM32H7 sets the message size as 4 instead of 1. This is the root cause of the problem

Problem Is Solved
Check the link eclipse-threadx/threadx#137

Please correct the example projects and leave your response here. Then, I will close the issue in this Repo as well as in Azure RTOS Repo.

Best Regards,
Ajit Basarur
@gdf8gdn8, @goldscott, @ajitbasarur, and @xiaocq2001

Hi @ajitbasarur,

Thank you very much for this detailed and very clear report. Thank you also for the fix proposal.

Our development teams have already been notified. A fix will be made available in a future release.

With regards,

ST Internal Reference: 114470

Hi @ALABSTM ,

Thank you for your prompt response. Sure, please notify me once the new software is released with this particular fix. I will close this issue in Github.
Meanwhile, I suggest the following improvements to your code.
Whenever you call any ThreadX API, please check for the return value. For examle, compare with TX_SUCCESS and proceed further. This will help handle unknown bugs in the future.

Thanks again for the software.

Best Regards,
Ajit B

Hi @ajitbasarur,

Thank you for your proposal.

With regards,

Hi @ajitbasarur,

Thank you for your contribution. The fix you requested has been implemented and is now available in the frame of the latest
x-cube-azrtos-h7 package V2.0.1 release.

This issue can be closed now. Thank you again for your contribution.

With regards,

Hi @RKOUSTM,

Thanks for the update. I will verify the code and then close the issue.
An update: I did a quick code check. My comments have only been incorporated in USB MSC related files. A question. Have you taken care of them in other projects as well? The issue appears if tx_queue size is not set properly, which is common across many projects. Could you please leave your comment on that as well?

Thanks and Best Regards,
Ajit B

Hi @ajitbasarur,

First, allow me to thank you for this report, once again. According to our development teams, the point you have reported wasn't replicated in other projects.

Thank you again for your contribution. We are looking forward to reading from you again.

With regards,