xbmc / xbmc

Kodi is an award-winning free and open source home theater/media center software and entertainment hub for digital media. With its beautiful interface and powerful skinning engine, it's available for Android, BSD, Linux, macOS, iOS, tvOS and Windows.

Home Page:https://kodi.tv/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Broken collation order for Latin characters with diacritics

9p6 opened this issue · comments

commented

Bug report

Describe the bug

Here is a clear and concise description of what the problem is:

It seems that sorting order for movie titles that contain Latin characters with diacritics is buggy. I'll describe the problem I discovered with Polish language but this very likely applies to other Latin based scripts.

Expected Behavior

Here is a clear and concise description of what was expected to happen:

Expected character order for Polish language:

a ą b c ć d e ę f g h i j k l ł m n ń o ó p q r s ś t u v w x y z ź ż

Actual Behavior

Actual order in movie title list (similar results in file manager)

ą a b ć c d e ę f g h i j k l m n ń ó o p q r ś s t u v w x y ż ź z ł

Most diacritics are inversed with relation to their base letter and in addition letter ł is shifted to the end

Possible Fix

As locale-aware sorting for Unicode has been a long solved problem I figure Kodi must be using an in house solution for whatever needs may be that requires fixing.

To Reproduce

Steps to reproduce the behavior:

To generate a test case I used the code:

for x in a ą b c ć d e ę f g h i j k l ł m n ń o ó p q r s ś t u v w x y z ź ż; do touch "$x.mkv"; echo "<movie><title>$x</title></movie>" > "$x.nfo"; done

Scan the above files with local NFO scraper and then verify order on movie title list.

Debuglog

The debuglog can be found here:
https://paste.kodi.tv/welogokofi.kodi

Screenshots

Here are some links or screenshots to help explain the problem:

movies

Additional context or screenshots (if appropriate)

Here is some additional context or explanation that might help:

Your Environment

Used Operating system:

  • Android

  • iOS

  • tvOS

  • Linux

  • macOS

  • Windows

  • Windows UWP

  • Operating system version/name: Ubuntu 22.04, LibreELEC 12

  • Kodi version: 20.2, 21

  • Locale: C.utf8, en_US.utf8, pl_PL.utf8 (same results with different locales)

Thank you for using Kodi and our issue tracker. This is your friendly Kodi GitHub bot 😉

It seems that you have not followed the template we provide and require for all bug reports (or have opened a roadmap item by accident). Please understand that following the template is mandatory and required for the team to be able handle the volume of open issues efficiently.

Please edit your issue message to follow our template and make sure to fill in all fields appropriately. The issue will be closed after one week has passed without satisfactory follow-up from your side.

This is an automatically generated message. If you believe it was sent in error, please say so and a team member will remove the "Ignored rules" label.

A log captured with debug logging enabled is required for this.

commented

I've attached the log from library scan process

From your log
info <general>: CLangInfo: loading resource.language.en_gb language information... debug <general>: trying to set locale to en_DE.UTF-8 info <general>: global locale set to C

You should have the default "c" collation.

commented

The default C collation would have all diacritics shifted to the end

a b c d e f g h i j k l m n o p q r s t u v w x y z ó ą ć ę ł ń ś ź ż

so definitely it's something else. I don't know what to make of en_DE.UTF-8 in the log this seems random.

Forgot to mention that I did try starting Kodi with different locales en_US.utf8, pl_PL.utf8 yet it doesn't change anything with the reported issue

LANG=en_US.utf8 kodi --debug