Incorrect identification of interface blocks in regex frontend

Question

Incorrect identification of interface blocks in regex frontend

awnawab opened this issue a year ago · comments

In the presence of interface blocks, the regex frontend incorrectly identifies the end subroutine statement inside the interface block as the end of the parent subroutine. This happens even in the presence of the subroutine name at the end of the interface block i.e. end subroutine name. Strangely, this only seems to happen once the raw source has been sanitized by FortranReader; the regex frontend correctly parses the unsanitized string correctly and the sanitized string incorrectly (the sanitization itself appears to be correct though). I've attached an example.
regex_frontend_bug.txt

Ahmad Nawab · Answer 1 · Sat Jan 28 2023 05:20:19 GMT+0800 (China Standard Time)

The problem arises because the regex search pattern for class SubroutineFunctionPattern looks for the end subroutine | function phrase, but the match with the subroutine name is only optional (as it should be). That's why it matches incorrectly with the end subroutine statement inside the interface block.

Defining a grouping inside the regex search for the named-grouping <spec> fixes the problem; in the constructor for SubroutineFunctionPattern, the search pattern r'(?P<spec>.*?)' should be replaced by
r'(?P<spec>.*(^[ \t\w()]*interface.+^[ \t\w()]*end interface\b.*?$)*.*?)'.

The interface blocks are then stored in the named-grouping 'spec' and can be retrieved later on if need be in a similar fashion to imports, variable declarations, calls etc (although I cannot foresee a scenario where we would need to retrieve interface blocks, given we already have enriched calls once we complete the scheduler parsing).