C++ interface to PCRE2 library compatible with <regex>
See:
This header-only library implements std::basic_regex, std::sub_match, std::match_results, std::regex_match, std::regex_search, std::regex_replace, std::regex_iterator, std::regex_token_iterator, and std::regex_error interfaces for the PCRE2 library.
Unsupported features:
regex_traits: collation is performed by the PCRE library itself, and there is no way to affect its behaviorregex_constants::syntax_option_type::ECMAScript,regex_constants::syntax_option_type::basic,regex_constants::syntax_option_type::extended,regex_constants::syntax_option_type::awk,regex_constants::syntax_option_type::grep,regex_constants::syntax_option_type::egrepare not supported for obvious reasons: if you need another matching engine, there is no need to use PCRE :-)regex_constants::match_flag_type::match_not_bow,regex_constants::match_flag_type::match_not_eow: these features are not supported by PCREregex_constants::match_flag_type::match_anyis always on (its description says: "If more than one match is possible, then any match is an acceptable result"; I believe this is always true for PCRE)regex_constants::match_flag_type::match_prev_avail: if I get the description right, you can just unsetregex_constants::match_flag_type::match_not_bolflagpcre2::regex_constants::error_typeandstd::regex_constants::error_typeconstants are different: PCRE2 does not officially provides constants for compilation errors (only for match errors), and therefore there is no portable way to match PCRE2 errors to std::regex_constants::error_type; in addition, PCRE2 returns much more possible errors than stdc++std::wregex,std::wcsub_match,std::wssub_match,std::wcmatch,std::wsmatch,std::wcregex_iterator,std::wsregex_iterator,std::wcregex_token_iterator, andstd::wsregex_token_iteratorare not supported: the size ofwchar_tdifferes across platforms, which makes it unsuitable for PCRE (however, the library does providepcre2::regex16,pcre2::regex32,pcre2::c16sub_match,pcre2::c32sub_match,pcre2::c16match,pcre2::c32match,pcre2::s16match,pcre2::s32match,pcre2::c16regex_iterator,pcre2::c32regex_iterator,pcre2::s16regex_iterator,pcre2::s32regex_iterator,pcre2::c16regex_token_iterator,pcre2::c32regex_token_iterator,pcre2::s16regex_token_iterator,pcre2::s32regex_token_iteratorforchar16_t/std::u16stringandchar32_t/std::u32stringtypes)
NB: char16_t/char32_t are supported only if sizeof(char16_t) == 2 and sizeof(char32_t) == 4 (the standard says that
char16_t (char32_t) has the same size as std::uint_least16_t (std::uint_least32_t) type, and their size may be more
than 2 (4) bytes).
Differences from stdc++:
regex_constants::match_flag_type::format_sedbehaves differently than in libstdc++: as far as I can tell, libstdc++ does not handle sed rules properly: there is no way to escape&or\<digit>in the format pattern. PCRE2++, however, follows the rules more strictly- new option:
regex_constants::utf: causes PCRE2 to regard both the pattern and the subject strings that are subsequently processed as strings of UTF characters instead of single-code-unit strings - new option:
regex_constants::ucp: changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but when this option is set, Unicode properties are used instead to classify characters
The library depends upon pcre2.h header file (provided by libpcre2-dev package in Ubuntu).
8-bit features require linking in pcre2-8 library (libpcre2-8-0 package in Ubuntu).
16-bit features require linking in pcre2-16 library (libpcre2-16-0 package in Ubuntu).
32-bit features require linking in pcre2-32 library (libpcre2-32-0 package in Ubuntu).