xlstools.c -> "unicode_decode_wcstombs" method fails for unicode (utf-16) string
shashi4u opened this issue · comments
I created an xls file called hello.xls with only 1 sheet and 1 cell with value "こんにちは".
I passed the file to xls2csv and nothing was printed to console.
When I started to debug I found wcstombs() at line#294 of xlstools.c failed.
OS: Windows
LIBICONV: NO
How to run libxls in windows
This is the hack I followed.
I installed Cygwin64 with gcc
I ran ./configure in Cygwin64
Then I copied the source only "include" and "src" directories and config.h to my visual studio project to build it as a static library.
I modified config.h to ignore iconv
I removed xls2csv from the project
I had to change ssize_t to size_t in some of the source files (I don't know what might get effected by this. Your input is appreciated for it.)
Visual Studio 2015 didn't recognize ssize_t (signed size_t).
One solution is to define ssize_t in the header file as
/* ssize_t is not defined on Windows */
#ifndef ssize_t
# if defined(_WIN64)
typedef signed __int64 ssize_t;
# else
typedef signed long ssize_t;
# endif
#endif /* !ssize_t */
/* On MSVC, ssize_t is SSIZE_T */
#ifdef _MSC_VER
#include <BaseTsd.h>
#define ssize_t SSIZE_T
#endif
Reference
https://gitlab.freedesktop.org/libnice/libnice/commit/3735b73d54f05facd36e49f3b5ee7c6fa82de9cf
I created a separate visual studio c++ console program for xls2csv by removing all linux specific code and added libxls project as its dependency.
I had to change a lot of code like sprintf -> sprintf_s, wcstombs -> wcstombs_s, etc to resolve most of the build errors.
When I was debugging the code I encountered the bug mentioned above.
This is my modified version of unicode_decode_wcstombs() method in xlstools.c source file to resolve the issue.
static char *unicode_decode_wcstombs(const char *s, size_t len, size_t *newlen) {
// Do wcstombs conversion
char *converted = NULL;
errno_t err;
size_t count, count2;
size_t i;
wchar_t *w;
#if defined(_WIN32) || defined(WIN32) || defined(_WIN64) || defined(WIN64) || defined(WINDOWS)
_locale_t loc = _create_locale(LC_CTYPE, ".65001");
#else
_locale_t loc = _create_locale(LC_CTYPE, "");
#endif
if (loc == NULL)
{
printf("_create_locale failed: %d\n", errno);
return NULL;
}
w = (wchar_t*)malloc((len / 2 + 1) * sizeof(wchar_t));
for (i = 0; i < len / 2; i++)
{
w[i] = (BYTE)s[2 * i] + ((BYTE)s[2 * i + 1] << 8);
}
w[len / 2] = L'\0';
err = _wcstombs_s_l(&count, NULL, 0, w, INT_MAX, loc);
if (err != 0)
{
if (newlen) *newlen = 0;
free(w);
return NULL;
}
converted = calloc(count + 1, sizeof(char));
err = _wcstombs_s_l(&count2, converted, count, w, count, loc);
free(w);
if (err != 0)
{
printf("_wcstombs_s_l failed (%lu)\n", (unsigned long)len / 2);
if (newlen) *newlen = 0;
return converted;
}
if (newlen) *newlen = count2;
return converted;
}
it successfully converted utf-16 to utf-8 string.
".65001" argument in _create_locale() for WINDOWS refers to "utf-8"
Is this the correct way?
I have attached my visual studio 2015 libxls solution.
Note: I have changed ssize_t to size_t in my code. You might want to change it back to original. (Copy the original source files, resolve any build errors like I mentioned above or use _CRT_SECURE_NO_WARNINGS pre-processor)
libxls.zip
Hi, thank you very much for the contribution. To make better sense of it, it would help me if you structured your work as a pull request (or multiple pull requests). I don't have access to a Windows machine at the moment so I will be reliant on other volunteers testing your work.
Just so you know, I have a related pull request here where I am reworking aspects of charset conversion:
You may want to use that as the basis of your patch.
I believe this issue is now fixed in the dev
branch. If it's not fixed, please open a new issue or a pull request.