sqitchers / sqitch

Sensible database change management

Home Page:https://sqitch.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could sqitch tables use C collation rather than the default collation?

datafoo opened this issue · comments

PostgreSQL indicates:

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

Could Sqitch create its tables using the C collation rather than the default collation?

That would remove the need from re-indexing due to collation version mismatch (see https://www.postgresql.org/docs/current/sql-altercollation.html#SQL-ALTERCOLLATION-NOTES) since the C collation does not change.

Yeah, probably a good idea, although even the C and POSIX collations can change when you upgrade glibc. There was a ton of discussion of the issue at PGCon this year, including this session. Maybe C locale would be the least bad, tho, considering that Sqitch mainly relies on indexes for uniqueness rather than sorting (it mainly sorts by dates).

even the C and POSIX collations can change when you upgrade glibc

I was under the impression that C collation doesn't ever change after reading a comment from @laurenz on a blog article he wrote at https://www.cybertec-postgresql.com/en/icu-collations-against-postgresql-data-corruption/#comment-6208123551

[the C collation] doesn't ever change. it is just the order defined by memcmp() of the strings, which only depends on the encoding.

Look here:

The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

Oh, perhaps you're right, I forgot about that. The examples from the presentation use en-US.UTF-8.