alphp / strftime

This provides a cross-platform alternative to strftime() for when it will be removed from PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Behaves differently from stock `strftime` for the Italian default year format

nu111 opened this issue · comments

commented

Hi, it seems that \PHP81_BC\strftime behaves differently from stock strftime for the Italian default year format

<?php
require_once "vendor/autoload.php";

$dt = new \DateTime("2022-01-01 00:00:00");

setlocale(LC_TIME, 'it_IT.utf8');
var_dump(strftime("%x %X", $dt->getTimestamp()));
var_dump(\PHP81_BC\strftime("%x %X", $dt->getTimestamp()));

outputs

string(19) "01/01/2022 00:00:00"
string(17) "01/01/22 00:00:00"

Date localization is done using IntlDateFormatter which is the ICU library's interface for dates:

<?php
var_dump((new IntlDateFormatter('it_IT', IntlDateFormatter::SHORT, IntlDateFormatter::MEDIUM, 'Europe/Madrid'))->format(strtotime('20220306 13:02:03')));
var_dump((new IntlDateFormatter('it_IT', IntlDateFormatter::MEDIUM, IntlDateFormatter::MEDIUM, 'Europe/Madrid'))->format(strtotime('20220306 13:02:03')));                                                          

Output:

string(18) "06/03/22, 13:02:03"
string(20) "6 mar 2022, 13:02:03"

The discrepancy between the implementation of strftime (in the libc in Linux) and the IntlDateFormatter (ICU) for the Italian short date may be the product of multiple deficiencies in strftime that have caused its deprecation.

In Wiki say's:

Years can be written with two or four digits; day and month are traditionally written without zero padding

The format returned by IntlDateFormatter is not the most common, but it is still correct.
According to the Wiki, the correct output for the short date format would be 6/3/2022.
If you think four-digit years is the correct format, then open an issue on IntlDateFormatter.

I've added locale test for Italian, this way the test will give away if there is a change in IntlDateFormatter.

commented

Thanks for explaining.

Yes ICU gives the dd/MM/yy format: https://github.com/unicode-org/cldr/blob/main/common/main/it.xml

So what corresponds is to open a pull request or issue in ICU

commented

With the purpose of backward compatibility, what is your opinion about make it behave the same way as stock strftime? I'm thinking to hardcode d/m/Y for the Italian locale and %x format.

It is easy to apply a specific solution for the specific case of the "%x" format and Italian localization:

        case '%x':
          // Line 75
          if (preg_match('~^it~i', $locale)) {
            return $timestamp->format('d/m/Y');
          }

However, it does not seem to me that the solution has to come through a patch.
If the short date format in ICU is incorrect it should be reported and corrected. If, on the other hand, the format of strftime is incorrect, this will be the one that must be corrected.

Do we vote?

  • 👍 => The patch is applied
  • 👎 => Patch not applied

@alphp it depends if you are looking for 100% backwards compatibility with strftime or if you consider that strftime was buggy in the first place :)

If you go the first path, then there might be a lot of other exceptions to implement :)

Also you cannot match all italian locales to this format. The format for it_CH is using . as a separator instead of / for example (and it's also using 4 digits years in strftime, but 2 digits in ICU).

Edit : link to strftime date formats per locale: https://lh.2xlibre.net/locale/it_CH/
Source of glibc shows each locale format here: https://github.com/lattera/glibc/blob/master/localedata/locales/aa_DJ
We could do a script to compare each locale-specific formats (%c, %x, %X, %r) between strftime and IntlDate to see which locales are affected.

One of the problems is the discrepancies that the strftime function itself had between implementations (many Windows vs Linux) and that have caused its deprecation.
What I am afraid of is that the discrepancies between ICU and strftime are going to be many and it would force us to create our own localization implementation.
That's why I like the solution of implementing localization through a standard library like ICU.

In it_CH year is also in 4 digits?

setlocale(LC_TIME, 'it_IT');
var_dump(strftime('%x', strtotime('20220306 13:02:03')));
setlocale(LC_TIME, 'it_CH');                                                                                                                                                                                        
var_dump(strftime('%x', strtotime('20220306 13:02:03')));                                                                                                                                                           
var_dump((new IntlDateFormatter('it_IT', IntlDateFormatter::SHORT, IntlDateFormatter::MEDIUM, 'Europe/Madrid'))->format(strtotime('20220306 13:02:03')));
var_dump((new IntlDateFormatter('it_CH', IntlDateFormatter::SHORT, IntlDateFormatter::MEDIUM, 'Europe/Madrid'))->format(strtotime('20220306 13:02:03'))); 

Output (Windows):

string(10) "06/03/2022"
string(10) "06.03.2022"
string(18) "06/03/22, 13:02:03"
string(18) "06.03.22, 13:02:03"

Output (Centos 6):

string(10) "06/03/2022"
string(10) "06. 03. 22"
string(18) "06/03/22, 14:02:03"
string(18) "06.03.22, 14:02:03"

strftime is broken, we can fight with which is the correct definition of each location or leave it in the hands of a specialized library.

Yes I believe that keeping just IntlDate and accepting that there might be some change of behaviour is the best to avoid re-implementing all strftime specifics and keeping the code simple to maintain.

Or you could also ship an array of all locales and %x/%X/%c/%r formats but well… that's a pain. And it won't fix discrepencies with Windows.

It is very striking because in Windows strftime has been shown to be consistent, showing the same pattern for it_IT as for it_CH, changing only the delimiters.
However on Linux for it_CH strftime have used a space and a period as a delimiter, something that is more strange than wrong.

The strftime function makes sense if it is consistent across all of its implementations.
It's true that date and DateTime::format are very handy, but they can't replace strftime. In this sense, the argumentation of its depreciation on the wiki seemed very poor to me, specifically the following statement is false:

Once again date() or DateTime::format() provide portable alternatives, and IntlDateFormatter::format() provides a more sophisticated, localization-aware alternative.

It is false because while it is true that date, DateTime::format, and IntlDateFormatter::format() provide a consistent interface and platform-independent localization, they cannot do what strftime does.
For example, with strftime it is very easy to localize text strings with dates and times. It is also very easy and readable to create filenames with timestamps for logs, logs or dumps.

$file = 'some_data-' . date('Ymd');
$file = strftime('some_data-%Y%m%d');

$message = strftime(gettext('Me levanto todos los días a las %H:%M.'));

Thinking about message localization using the alternate calls, I see so many localization problems and horrible, unmaintainable code that it makes me gag.

The main purpose of this implementation is backwards compatibility.
But a no less important goal is to keep the function alive. This means fixing whatever is broken and, if necessary, adding new features.

Interesting pull request #9.
I had not thought about the possibility of using ICU or not depending on whether it is available or we do not want to use it.

The more I look into this (I wanted to work on #2), the more I think it is futile trying to achieve parity with the original strftime output. Since IntlDateFormatter is using it's own locale bundles and strftime is using the locales available on the system, they will never match 100%. I doubt strftime output matches between different systems.

So I agree, it does not make sense in trying to "fix" the "problem" @nu111 reported.

However I also wonder how much sense the locale dependent tests make. An it_CH test can not test that the output is the same as strftime() for reasons mentioned above. It can only test that the output of IntlDateFormatter is the output we expect from IntlDateFormatter. Testing that once to make sure our strftime compatible placeholders are transformed correctly, makes sense. But testing it multiple times with different locales does not test anything meaningful. Placeholder transformation is always the same. And we can assume IntlDateFormatter to be tested upstream and working as intended.