The Problem
Troubleshooting

The Problem

The following query yields different in different Operation System. I did some google search and sounds like it is related to Locale/Collate settings.

SELECT ARRAY(SELECT UNNEST(ARRAY['id', 'organization_id', 'locked_by', 'lock_reasons', 'lock_note', 'lock_source', 'sent_notification_emails', 'created_at']) ORDER BY 1);

with Collate "C", I got result

{created_at,id,lock_note,lock_reasons,lock_source,locked_by,organization_id,sent_notification_emails}

with Collate "UTF-8", I got result

{created_at,id,locked_by,lock_note,lock_reasons,lock_source,organization_id,sent_notification_emails}

Troubleshooting

Looks like the break happens on whether _ is smaller than e or not. Presumably, it shall be given their sequence number in ASCII is 95 and 101 respectively.

However, following queries does not always yield True but True and False in different operating systems.

SELECT '_' COLLATE "en_US" < 'e' COLLATE "en_US";

SELECT 'lock_note' COLLATE "en_US" < 'locked_by' COLLATE "en_US";

I did some google search and the general impression I got is when locale is enabled, sorting in postgres leveraging locale rule from Operating System and the rules may vary cross OS, which could explain why aforementioned queries yields inconsistent result cross OS.

My actual problem to solve is to want same sorting result between postgres and my Application code(both run at same OS). A very quick skim to postgres source code gives me impression that postgres utilize ucol_strcollUTF8 when for UTF-8 sorting. So I assume if my application code call ucol_strcollUTF8, it shall given same result as postgres. I did POC (see details in icu_string_comparison.c) and I got following result, which is not same to postgres.

"lock_note" is smaller than "locked_by"

Quick recap

Code	Result
`SELECT 'lock_note' COLLATE "en_US" < 'locked_by' COLLATE "en_US"`	False
`icu_string_comparison.c`: `'lock_note'` < `'locked_by'`	True

So I probably missed something like sorting in postgres is not as simply as using ucol_strcolUTF8 and something else.

hw202207 / poc_icu_comparison

Table of Contents

The Problem

Troubleshooting

About

Languages