Could sqitch tables use C collation rather than the default collation?

Question

Could sqitch tables use C collation rather than the default collation?

datafoo opened this issue a year ago · comments

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

Could Sqitch create its tables using the C collation rather than the default collation?

datafoo · Answer 1 · Wed Jun 14 2023 21:43:48 GMT+0800 (China Standard Time)

That would remove the need from re-indexing due to collation version mismatch (see https://www.postgresql.org/docs/current/sql-altercollation.html#SQL-ALTERCOLLATION-NOTES) since the C collation does not change.

David E. Wheeler · Answer 2 · Tue Jun 20 2023 01:28:04 GMT+0800 (China Standard Time)

Yeah, probably a good idea, although even the C and POSIX collations can change when you upgrade glibc. There was a ton of discussion of the issue at PGCon this year, including this session. Maybe C locale would be the least bad, tho, considering that Sqitch mainly relies on indexes for uniqueness rather than sorting (it mainly sorts by dates).

datafoo · Answer 3 · Tue Jun 20 2023 14:14:59 GMT+0800 (China Standard Time)

even the C and POSIX collations can change when you upgrade glibc

I was under the impression that C collation doesn't ever change after reading a comment from @laurenz on a blog article he wrote at https://www.cybertec-postgresql.com/en/icu-collations-against-postgresql-data-corruption/#comment-6208123551

[the C collation] doesn't ever change. it is just the order defined by memcmp() of the strings, which only depends on the encoding.

Look here:

The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

David E. Wheeler · Answer 4 · Tue Jun 20 2023 22:54:31 GMT+0800 (China Standard Time)

Oh, perhaps you're right, I forgot about that. The examples from the presentation use en-US.UTF-8.