New check: re.sub[n]/re.split with regex flag as third argument w/o keyword
jakkdl opened this issue · comments
Context
Example code
$ re.sub('a', 'b', 'aaa')
'bbb'
$ re.sub('a', 'b', 'aaa', re.IGNORECASE)
'bba'
>>> re.split(' ', 'a a a a')
['a', 'a', 'a', 'a']
>>> re.split(' ', 'a a a a', re.I)
['a', 'a', 'a a']
>>> re.split(' ', 'a a a a', flags=re.I)
['a', 'a', 'a', 'a']
The fourth parameter to re.sub[n]
is in fact count
, and the third to split
is maxsplit
, and as seen in the above linked issues this is a very common mistake.
Implementation
The check would consist of checking calls to re.split
, re.sub
and re.subn
, with three or more parameters, where the third parameter does not have an explicit keyword.
To make it more lenient, it could require the third param to match re.[uppercase], or match any all-caps variable to cover from re import IGNORECASE
, or any variable at all to cover myregexflags = re.I | re.X
.
The WIP cpython PR's seem to be on track to enforcing the 3rd and 4th parameters to be kw-only though, and in case that actually becomes deprecated by 3.12 then the maximally strict variant of simply enforcing that when running <3.12 seems clearly appropriate.
Could even go so far as to require keyword on all re
functions for parameters other than pattern
, repl
or string
. So e.g. re.match('a', 'A', re.I)
would raise an error. I don't think that's the direction cpython
is going, and would lead to a lot of false alarms, but could possible put that version of it as an opinionated warning.
I'm inclined to keep this simple: only apply the warning to re.sub
, re.subn
, and re.split
, but require that count
and maxsize
(and therefore also flags
) always be passed as keyword arguments to these functions, irrespective of value.
Requiring flags=
for all other functions seems like an out-of-scope style issue, since they don't share the possibility of argument confusion. Since I dislike making users think about configuration, I'd leave this out.