sqitchers / sqitch

Sensible database change management

Home Page:https://sqitch.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't decode ill-formed UTF-8 octet sequence when using --set argument

joaostorrer opened this issue · comments

Sqitch is failing whenever I set a variable with a value that contains an accent:

root@d975b89f1c63:/repo# sqitch deploy flipr_test --set var="ãçê"
Deploying changes to flipr_test
  + bla_heg ......... not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.

Deploy failed

The same happens with revert and rebase commands.

I tried setting LC_ALL & LANG envs to pt_BR.UTF-8, but the error continues.

root@d975b89f1c63:/repo# sqitch --version
sqitch (App::Sqitch) v1.4.0
root@d975b89f1c63:/repo# perl -v

This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi
(with 47 registered patches, see perl -V for more detail)

Copyright 1987-2021, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

Hrm. That's coming from the spool method, which spools output from the database CLI. Which RDBMS are you using? I wonder if it's not properly emitting UTF-8 for some reason.

I'm using PostgreSQL.

Is your database using UTF-8 encoding? Check client_encoding.

Sorry, actually Postgres works fine, the problem is with Oracle database.
My deploy consists in deploying to Postgres and Oracle, so i got confused.

My Oracle database is configured with WE8MSWIN1252 charset:

SQL> select *
  2  from   nls_database_parameters
  3  where  parameter in ('NLS_RDBMS_VERSION', 'NLS_DATE_LANGUAGE', 'NLS_NCHAR_CHARACTERSET', 'NLS_CHARACTERSET', 'NLS_ISO_CURRENCY', 'NLS_TERRITORY', 'NLS_LANGUAGE')
  4  order  by 1;
 
PARAMETER                                                                        VALUE
-------------------------------------------------------------------------------- ----------------------------------------------------------------
NLS_CHARACTERSET                                                                 WE8MSWIN1252
NLS_DATE_LANGUAGE                                                                AMERICAN
NLS_ISO_CURRENCY                                                                 AMERICA
NLS_LANGUAGE                                                                     AMERICAN
NLS_NCHAR_CHARACTERSET                                                           AL16UTF16
NLS_RDBMS_VERSION                                                                19.0.0.0.0
NLS_TERRITORY                                                                    AMERICA
 
7 rows selected

But when I define the Sql*Plus variable with the same value it works fine:

SQL> define test = "ãçê"
SQL> define
DEFINE _SQLPLUS_RELEASE = "000000000" (CHAR)
DEFINE _EDITOR          = "PLSQLDev" (CHAR)
DEFINE _DATE            = "17/01/2024" (CHAR)
DEFINE _PRIVILEGE       = "" (CHAR)
DEFINE _O_VERSION       = "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 " (CHAR)
DEFINE _O_RELEASE       = "000000000" (CHAR)
DEFINE test             = "ãçê" (CHAR)
 
SQL> select '&test' from dual;
 
'ÃÇÊ'
-----
ãçê

Sqitch requires everything to be UTF-8, and it sets up the Oracle connection for that:

BEGIN {
# We tell the Oracle connector which encoding to use. The last part of the
# environment variable NLS_LANG is relevant concerning data encoding.
$ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8';
# Disable SQLPATH so that no start scripts run.
$ENV{SQLPATH} = '';
}

So be sure that the contents of the variable are UTF-8 and not CP1252.

I tried a few things to guarantee the variable is in UTF-8:

root@18d3f3eb022a:/repo# sqitch deploy flipr_test --set var="$(echo "ãçê")"
Deploying changes to flipr_test
  + appschema ................. not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.

Deploy failed
root@18d3f3eb022a:/repo# sqitch deploy flipr_test --set var="$(echo "ãçê" | iconv -t UTF-8)"
Deploying changes to flipr_test
  + appschema ................. not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.

Deploy failed
root@18d3f3eb022a:/repo# echo "ãçê" | file -
/dev/stdin: UTF-8 Unicode text
root@18d3f3eb022a:/repo# echo -n "ãçê" | file -
/dev/stdin: UTF-8 Unicode text, with no line terminators

I also tested deploying in another Oracle Database that is configured with NLS_CHARACTERSET AL32UTF8, but I got the same error message.

I think the problem is not the text read from the command-line, but text returned from SQLPlus. If it doesn't emit UTF-8 then Sqitch will have trouble reading it. Could SQLPlus be ignoring OUTPUT_CHARSET?

Try this patch:

--- a/lib/App/Sqitch.pm
+++ b/lib/App/Sqitch.pm
@@ -472,6 +472,7 @@ sub spool {
     }
 
     local $SIG{PIPE} = sub { die 'spooler pipe broke' };
+    binmode $fh, ':raw';
     if (ref $fh eq 'ARRAY') {
         for my $h (@{ $fh }) {
             print $pipe $_ while <$h>;

It should treat the data read from SQL*Plus as raw bytes, rather than try to decode them as UTF-8. Will be interesting to see what it emits.

Yes, the patch works because it's not decoding the output from SQLPlus into UTF-8. Which means SQLPlus is not emitting UTF-8. Clearly there's some additional configuration we need to make to get it to do so.

Sure seems like setting NLS_LANG should work.

I tried setting NLS_LANG to different values like BRAZILIAN PORTUGUESE_BRAZIL.UTF8, PORTUGUESE_BRAZIL.UTF8, AMERICAN_AMERICA.UTF8 and AMERICAN_AMERICA.AL32UTF8, but they all failed with Can't decode ill-formed...

Yeah, Sqitch sets NLS_LANG, so anything you set it to doesn't work. Unless you tried setting it in the source?

BEGIN {
# We tell the Oracle connector which encoding to use. The last part of the
# environment variable NLS_LANG is relevant concerning data encoding.
$ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8';
# Disable SQLPATH so that no start scripts run.
$ENV{SQLPATH} = '';
}

If I set to BRAZILIAN PORTUGUESE_BRAZIL.UTF8 or PORTUGUESE_BRAZIL.UTF8 in the source I get the same error.

For some reason SQL*Plus is returning non-UTF-8 bytes. I do not understand why and don't have the resources to investigate. Maybe it's worth asking on Stack Overflow or an Oracle forum?

The key thing is to get SQL*Plus to emit only UTF-8 bytes. Though honestly I have no idea what it thinks it's emitting. Your change name is ASCII, so should be fine. When you tried the patch a few weeks ago, you said it worked; what, exactly, did it emit? Very curious to know what it might be choking on.

I inserted the content of var into a table to get what it emit with the patch applied.
The column was inserted with que upside down question mark (¿).
As far as I know, this only indicates that the database tried to inserted a character that is not recognized by SQL*Plus.

Is the deploy script you wrote UTF-8 encoded?

Yes, all the .sql scripts inside my project are UTF-8 (without BOM) encoded and Windows (CR LF).