Can't decode ill-formed UTF-8 octet sequence when using --set argument
joaostorrer opened this issue · comments
Sqitch is failing whenever I set a variable with a value that contains an accent:
root@d975b89f1c63:/repo# sqitch deploy flipr_test --set var="ãçê"
Deploying changes to flipr_test
+ bla_heg ......... not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.
Deploy failed
The same happens with revert
and rebase
commands.
I tried setting LC_ALL & LANG envs to pt_BR.UTF-8, but the error continues.
root@d975b89f1c63:/repo# sqitch --version
sqitch (App::Sqitch) v1.4.0
root@d975b89f1c63:/repo# perl -v
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi
(with 47 registered patches, see perl -V for more detail)
Copyright 1987-2021, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
Hrm. That's coming from the spool method, which spools output from the database CLI. Which RDBMS are you using? I wonder if it's not properly emitting UTF-8 for some reason.
I'm using PostgreSQL.
Is your database using UTF-8 encoding? Check client_encoding.
Sorry, actually Postgres works fine, the problem is with Oracle database.
My deploy consists in deploying to Postgres and Oracle, so i got confused.
My Oracle database is configured with WE8MSWIN1252 charset:
SQL> select *
2 from nls_database_parameters
3 where parameter in ('NLS_RDBMS_VERSION', 'NLS_DATE_LANGUAGE', 'NLS_NCHAR_CHARACTERSET', 'NLS_CHARACTERSET', 'NLS_ISO_CURRENCY', 'NLS_TERRITORY', 'NLS_LANGUAGE')
4 order by 1;
PARAMETER VALUE
-------------------------------------------------------------------------------- ----------------------------------------------------------------
NLS_CHARACTERSET WE8MSWIN1252
NLS_DATE_LANGUAGE AMERICAN
NLS_ISO_CURRENCY AMERICA
NLS_LANGUAGE AMERICAN
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 19.0.0.0.0
NLS_TERRITORY AMERICA
7 rows selected
But when I define the Sql*Plus variable with the same value it works fine:
SQL> define test = "ãçê"
SQL> define
DEFINE _SQLPLUS_RELEASE = "000000000" (CHAR)
DEFINE _EDITOR = "PLSQLDev" (CHAR)
DEFINE _DATE = "17/01/2024" (CHAR)
DEFINE _PRIVILEGE = "" (CHAR)
DEFINE _O_VERSION = "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 " (CHAR)
DEFINE _O_RELEASE = "000000000" (CHAR)
DEFINE test = "ãçê" (CHAR)
SQL> select '&test' from dual;
'ÃÇÊ'
-----
ãçê
Sqitch requires everything to be UTF-8, and it sets up the Oracle connection for that:
sqitch/lib/App/Sqitch/Engine/oracle.pm
Lines 20 to 27 in 7edc536
So be sure that the contents of the variable are UTF-8 and not CP1252.
I tried a few things to guarantee the variable is in UTF-8:
root@18d3f3eb022a:/repo# sqitch deploy flipr_test --set var="$(echo "ãçê")"
Deploying changes to flipr_test
+ appschema ................. not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.
Deploy failed
root@18d3f3eb022a:/repo# sqitch deploy flipr_test --set var="$(echo "ãçê" | iconv -t UTF-8)"
Deploying changes to flipr_test
+ appschema ................. not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.
Deploy failed
root@18d3f3eb022a:/repo# echo "ãçê" | file -
/dev/stdin: UTF-8 Unicode text
root@18d3f3eb022a:/repo# echo -n "ãçê" | file -
/dev/stdin: UTF-8 Unicode text, with no line terminators
I also tested deploying in another Oracle Database that is configured with NLS_CHARACTERSET AL32UTF8, but I got the same error message.
I think the problem is not the text read from the command-line, but text returned from SQLPlus. If it doesn't emit UTF-8 then Sqitch will have trouble reading it. Could SQLPlus be ignoring OUTPUT_CHARSET
?
Try this patch:
--- a/lib/App/Sqitch.pm
+++ b/lib/App/Sqitch.pm
@@ -472,6 +472,7 @@ sub spool {
}
local $SIG{PIPE} = sub { die 'spooler pipe broke' };
+ binmode $fh, ':raw';
if (ref $fh eq 'ARRAY') {
for my $h (@{ $fh }) {
print $pipe $_ while <$h>;
It should treat the data read from SQL*Plus as raw bytes, rather than try to decode them as UTF-8. Will be interesting to see what it emits.
The patch works!
Yes, the patch works because it's not decoding the output from SQLPlus into UTF-8. Which means SQLPlus is not emitting UTF-8. Clearly there's some additional configuration we need to make to get it to do so.
Sure seems like setting NLS_LANG
should work.
I tried setting NLS_LANG
to different values like BRAZILIAN PORTUGUESE_BRAZIL.UTF8
, PORTUGUESE_BRAZIL.UTF8
, AMERICAN_AMERICA.UTF8
and AMERICAN_AMERICA.AL32UTF8
, but they all failed with Can't decode ill-formed...
Yeah, Sqitch sets NLS_LANG
, so anything you set it to doesn't work. Unless you tried setting it in the source?
sqitch/lib/App/Sqitch/Engine/oracle.pm
Lines 20 to 27 in 7edc536
If I set to BRAZILIAN PORTUGUESE_BRAZIL.UTF8
or PORTUGUESE_BRAZIL.UTF8
in the source I get the same error.
For some reason SQL*Plus is returning non-UTF-8 bytes. I do not understand why and don't have the resources to investigate. Maybe it's worth asking on Stack Overflow or an Oracle forum?
The key thing is to get SQL*Plus to emit only UTF-8 bytes. Though honestly I have no idea what it thinks it's emitting. Your change name is ASCII, so should be fine. When you tried the patch a few weeks ago, you said it worked; what, exactly, did it emit? Very curious to know what it might be choking on.
I inserted the content of var
into a table to get what it emit with the patch applied.
The column was inserted with que upside down question mark (¿
).
As far as I know, this only indicates that the database tried to inserted a character that is not recognized by SQL*Plus.
Is the deploy script you wrote UTF-8 encoded?
Yes, all the .sql scripts inside my project are UTF-8 (without BOM) encoded and Windows (CR LF).