makamaka / Text-CSV

comma-separated values manipulator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoding Issue: Arabic Characters Conflict with Another Language in the Same Row

SheikhThingsUp opened this issue · comments

I've noticed an issue with my script that queries a database and generates a CSV file. Specifically, when a row contains characters from languages other than Arabic and English, the Arabic characters in that row aren't encoded correctly. It seems that either the combine or string methods of CSV are causing this problem.

For example, in a line where the name "Pelé" is present, the Arabic word "نسيج" is transformed into random characters like "Ù�سÙ�ج,Ù�سÙ�ج". Interestingly, when I open the file in VI, I observe that the same word appears differently encoded in two different locations.

I've experimented with both the binary => 1 option and without it, but the issue persists.

my $csv = Text::CSV->new( { binary => 1 } );
open my $fh, ">:encoding(UTF-8)", "new.csv" or die "new.csv: $!";
print $fh "\x{feff}";

my $status = $csv->combine(@row);    # combine columns into a string
my $line   = $csv->string();
print $fh $line

When i take Text::CSV out of hte equation, and just write directly to the fine with minimum transformation (commas and quotes), it works fine.

The issue is also present in CSV_XS