[4.0.2.3][master] Cyrillic is not allowed in SEO keyword
nmpetkov opened this issue · comments
What version of OpenCart are you reporting this for?
[4.0.2.3][master]
Describe the bug
In 7 places in the code, both in the master branch and in version 4.0.2.3, the following check is used:
if (preg_match('/[^a-zA-Z0-9\/_-]|[\p{Cyrillic}]+/u', $keyword)) {
$json['error']['keyword_' . $store_id . '_' . $language_id] = $this->language->get('error_keyword_character');
}
In my case it doesn't allow keyword like:
добър-ден
This is in Bulgarian, the origin of the Cyrillic alphabet. I did tests in different versions of PHP, from 5.6 to 8.2, the result is the same.
Server / Test environment (please complete the following information):
- Deployed to a web server? yes
- Operating system Linux
- PHP version 8.2
This one seems to work correctly. I would also add Greek characters as OC is popular in Greece.
if (preg_match('/[^\p{Latin}\p{Cyrillic}\p{Greek}0-9\/\.\-\_]+/u', $keyword)) {
also allows /.-_
chars for my-url/my_favorite_page.html
IMHO this regex must be configurable in store settings to allow adding any language.
Yes, making it configurable makes sense, eventually there will be a need to support other character groups, too, e.g. Arabic, Mandarin Chinese, etc.
I've done tests with the above expression and it works very well, with one exception: {Latin} also allows letters like -ä-ü. So I suggest a small addition to the expression:
if (preg_match('/[^\p{Latin}\p{Cyrillic}\p{Greek}a-zA-Z0-9\/\.\-\_]+/u', $keyword)) {
In this case, when setting up the expression, removing the {Latin} section will only limit to a-Az-Z.
{Latin} also allows letters like -ä-ü.
Don't see any problem. If we allow utf8 like Cyrillic or Greek, why should we disallow umlaut? What if a customer with a german website wants to see umlaut in urls?
As I mentioned before it would be great to have this regex configurable for urls and downloads filenames. This will take away all issues of multi-language urls as webdeveloper can configure these rules himself.
@stalker780 That's exactly what I mean by adding a-Az-Z in case there is a setting of what languages to allow. Those who wish to allow letters of the type -ä-ü will add {Latin} to the settings, but those who do not wish to have such letters will remove {Latin} from the settings. In this case, the a-Az-Z rule would apply and so this possibility would be possible. On my site, for example, I would like to allow a-Az-Z and Cyrillic, but I don't want any letters like -ä-ü.
Maybe [^\d\p{L}\/._-]
until make it configurable.
@stalker780 I have added a validation helper file. if u want to help fill it up for me since u suggested to do something like this.