editor.Tag[] contains only part of expected match groups
CoruNethron opened this issue · comments
Hello.
I have an issue with following scenario:
Running this code:
editor:findtext("(/[a-z])\\.((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))", SCFIND_REGEXP + SCFIND_MATCHCASE, 0, editor.TextLength)
for C=0,25 do
print(editor.Tag[C])
end
Over this fragment of text:
/y.fyx
We receive following output:
nil
/y
f
nil
nil
nil
nil
f
y
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
Seems, like, not more than 10 elements of editor.Tag[]
is really filled with appropriate data.
If I perform same operation using NP++ Find-and-replace, with
find:
(/[a-z])\.((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))
Replace with:
${0}\n${1}\n${2}\n${3}\n${4}\n${5}\n${6}\n${7}\n${8}\n${9}\n${10}\n${11}\n${12}\n${13}\n${14}\n${15}\n${16}\n${17}\n${18}\n${19}\n${20}
Then we receive following output:
/y.fyx
/y
f
f
y
y
x
x
With all the elements with indexes above 10 are in their places, as expected.
Please let me know if it's possible to somehow get at least all the elements up to 100 index within editor.Tag[]
, after finding regex with editor:findtext()
Thank you for great plugin btw.
Seems, like, my question have nothing to do with LuaScript, cause, it looks like Scintilla limitation, according to this line:
https://sourceforge.net/p/scintilla/code/ci/default/tree/src/Editor.cxx#l5614
Anyway, I appreciate any ideas.
Hi @CoruNethron,
Glad you are finding this useful. Yes as you have noted, this is a limitation of Scintilla. By default Scintilla has an built in implementation of a regex engine that is 'good enough' in most situations. Notepad++ implemented its own regex engine that is much more powerful. When using editor.Tag[]
it doesn't go through Notepad++'s implementation.
Depending what you are needing to do exactly there are a few options. You could use the regex to pull out the entire string and process it more in Lua and you could even use Lua Patterns (similar to regex).
LuaScript also has a built in helper method called editor:match()
but after looking at the code it looks like this doesn't support regex replacement.
So my final suggestion is use editor:ReplaceTargetRE(text)
(ignore the verbage about groups 1 through 9) which I think has the results you want:
start, endd = editor:findtext([[(/[a-z])\.((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))((<(((?>[^><]+)|(?-3))*)>)|([SK])|([a-z]))]], SCFIND_REGEXP + SCFIND_MATCHCASE, 0, editor.TextLength)
editor:SetTargetRange(start, endd)
editor:ReplaceTargetRE([[${0}\n${1}\n${2}\n${3}\n${4}\n${5}\n${6}\n${7}\n${8}\n${9}\n${10}\n${11}\n${12}\n${13}\n${14}\n${15}\n${16}\n${17}\n${18}\n${19}\n${20}]])
This code specifically calls ReplaceTargetRE()
which tells the replacement operation to use the Regex Engine to do the replacement. Keep in mind if you need do this in a loop you'll have to keep track of your position in the document and restart the search from there.
Thank you, @dail8859
I do use editor:ReplaceTargetRE() , after all the stuff, but I need to pass matched subgroups to the Lua callback in between. So, it seems, that I'll stick with your recomendation to re-process entire string one more time within Lua. Sad, that there is no chance to reuse powerful PCRE results.
I close the ticket, cause solution is out of LuaScript's scope.
Best regards.
I need to pass matched subgroups to the Lua callback in between
Yeah if this the exact use case then as far as I know, Scintilla or Notepad++ does not expose anything low enough to get this kind of information out of the PCRE implementation.