opencart / opencart

A free shopping cart system. OpenCart is an open source PHP-based online e-commerce solution.

Home Page:https://www.opencart.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[4.0.2.3][master] Cyrillic is not allowed in SEO keyword

nmpetkov opened this issue · comments

What version of OpenCart are you reporting this for?
[4.0.2.3][master]

Describe the bug
In 7 places in the code, both in the master branch and in version 4.0.2.3, the following check is used:

if (preg_match('/[^a-zA-Z0-9\/_-]|[\p{Cyrillic}]+/u', $keyword)) {
    $json['error']['keyword_' . $store_id . '_' . $language_id] = $this->language->get('error_keyword_character');
}

In my case it doesn't allow keyword like:
добър-ден
This is in Bulgarian, the origin of the Cyrillic alphabet. I did tests in different versions of PHP, from 5.6 to 8.2, the result is the same.

Server / Test environment (please complete the following information):

  • Deployed to a web server? yes
  • Operating system Linux
  • PHP version 8.2

if (preg_match('/[^a-zA-Z0-9\/_-]|[\p{Cyrillic}]+/u', $keyword)) {

This one seems to work correctly. I would also add Greek characters as OC is popular in Greece.
if (preg_match('/[^\p{Latin}\p{Cyrillic}\p{Greek}0-9\/\.\-\_]+/u', $keyword)) {

also allows /.-_ chars for my-url/my_favorite_page.html

IMHO this regex must be configurable in store settings to allow adding any language.

Yes, making it configurable makes sense, eventually there will be a need to support other character groups, too, e.g. Arabic, Mandarin Chinese, etc.

I've done tests with the above expression and it works very well, with one exception: {Latin} also allows letters like -ä-ü. So I suggest a small addition to the expression:
if (preg_match('/[^\p{Latin}\p{Cyrillic}\p{Greek}a-zA-Z0-9\/\.\-\_]+/u', $keyword)) {
In this case, when setting up the expression, removing the {Latin} section will only limit to a-Az-Z.

{Latin} also allows letters like -ä-ü.

Don't see any problem. If we allow utf8 like Cyrillic or Greek, why should we disallow umlaut? What if a customer with a german website wants to see umlaut in urls?

As I mentioned before it would be great to have this regex configurable for urls and downloads filenames. This will take away all issues of multi-language urls as webdeveloper can configure these rules himself.

@stalker780 That's exactly what I mean by adding a-Az-Z in case there is a setting of what languages to allow. Those who wish to allow letters of the type -ä-ü will add {Latin} to the settings, but those who do not wish to have such letters will remove {Latin} from the settings. In this case, the a-Az-Z rule would apply and so this possibility would be possible. On my site, for example, I would like to allow a-Az-Z and Cyrillic, but I don't want any letters like -ä-ü.

commented

Maybe [^\d\p{L}\/._-] until make it configurable.

@stalker780 I have added a validation helper file. if u want to help fill it up for me since u suggested to do something like this.