Boost regex engine
zufuliu opened this issue · comments
See PR #722, @atauzki 👍 is working on integrating Boost regex , after the changes are merged most if not all regex issues should be fixed.
At the end our code will have three regex engines:
defined preprocessors | regex engine |
---|---|
BOOST_REGEX_STANDALONE |
Scintilla's simple POSIX regex plus Boost regex |
NO_CXX11_REGEX |
Scintilla's simple POSIX regex (current build configuration) |
none | Scintilla's simple POSIX regex plus C++ STL std::regex |
Some TODOs:
- Ensure all existing use of regex still works after change to Boost.
- Update regex help string, add common syntax like
a|b
,{n,m}
, etc. based on https://www.boost.org/doc/libs/1_83_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions and https://docs.python.org/3/library/re.html - Add option to select regex engine (Boost Unicode, Boost ASCII, Scintilla ASCII), could default to Boost Unicode.
- Add option "Dot match all character (include new line)" or "Multiline mode" on Find and Replace dialogs, issue #53.
following is some performance test results (match count and time in millisecond) for attached JSON file (produced by expand.py in the zip for Visual Studio 2022 instalation catalog.json) with commit 38be0ce. As such I'm going to remove SCI_OWNREGEX
build configuration (still needs time to improve the speed).
re-test-1015.zip
regex | RESearch | std::wregex | std::regex | boost::wregex | boost::regex |
---|---|---|---|---|---|
\w+ |
1434315, 315 | 1436523, 7636 | 1423835, 4372 | 1436523, 2035 | 1501396, 800 |
[a-zA-Z0-9_]+ |
1423835, 331 | 1423835, 7654 | 1423835, 4386 | 1423835, 2855 | 1423835, 777 |
\d+ |
1028016, 280 | 1028016, 6470 | 1028016, 6475 | 1028016, 2050 | 1028016, 739 |
[0-9]+ |
1028016, 286 | 1028016, 6475 | 1028016, 6218 | 1028016, 2044 | 1028016,725 |
\s+ |
895401, 252 | 895945, 6151 | 895403, 5972 | 895917, 1883 | 911355, 662 |
[ \t]+ |
895401, 254 | 895401, 6375 | 895401, 6200 | 895401, 2935 | 895401, 678 |
^[ \t]+ |
440216, 92 | 440216, 846 | 440216, 724 | 440216, 465 | 440216, 234 |
[ \t]+$ |
0, 154 | 0, 6492 | 0, 6324 | 0, 575 | 0, 84 |
今天发布的版本有没有包含Boost regex ??我看替换对话框没啥变化哦。
今天发布的版本有没有包含Boost regex
Just download latest builds from boost regex branch, e.g. https://github.com/zufuliu/notepad2/actions/runs/7517811166
Win32 build with boost::regex
(depends on SleepConditionVariableSRW()
and WakeAllConditionVariable()
) or std::regex
(depends on InitializeCriticalSectionEx()
) doesn't run on XP.
请问下boost::regex
是不支持匹配\pP
这种属性匹配吗
更多测试属性 正则表达式-匹配标点符号
另外匹配的(pattern)
目前是用 \1,\2
引用
将来会考虑用常用的$1,$2
代替吗
另外匹配的
(pattern)
目前是用\1,\2
引用
将来会考虑用常用的$1,$2
代替吗
boost本身支持,但是现在的代码没有用这个实现,只加了个TODO注释
Win32 build with
boost::regex
(depends onSleepConditionVariableSRW()
andWakeAllConditionVariable()
) orstd::regex
(depends onInitializeCriticalSectionEx()
) doesn't run on XP.
This can be "fixed" by disabling thread-safe local static initialization with /Zc:threadSafeInit-
:
https://learn.microsoft.com/en-us/cpp/build/reference/zc-threadsafeinit-thread-safe-local-static-initialization?view=msvc-170
The implementation of this feature relies on Windows operating system support functions in Windows Vista and later operating systems.
Another bug related to boost regex search:
if execute a zero-width match (eg: ^
, $
, \b
) searching next/previous for multiple times, it just stucks at its original place from the second time.
Emeditor also has this bug but Notepad3 doesn't, I had no good idea working on this.
请问下
boost::regex
是不支持匹配\pP
这种属性匹配吗
更多测试属性 正则表达式-匹配标点符号
libICU编译出来至少20-30M吧,代价太大。
Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-
libICU编译出来至少20-30M吧,代价太大。
Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-
it doesn't have icu namespace in it's icu.h, but boost uses icu's c++ api. And no C++ symbol exported in icu.dll.