libxls / libxls

Read binary Excel files from C/C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

xlstools.c -> "unicode_decode_wcstombs" method fails for unicode (utf-16) string

shashi4u opened this issue · comments

I created an xls file called hello.xls with only 1 sheet and 1 cell with value "こんにちは".
I passed the file to xls2csv and nothing was printed to console.
When I started to debug I found wcstombs() at line#294 of xlstools.c failed.

OS: Windows
LIBICONV: NO

How to run libxls in windows
This is the hack I followed.
I installed Cygwin64 with gcc
I ran ./configure in Cygwin64
Then I copied the source only "include" and "src" directories and config.h to my visual studio project to build it as a static library.
I modified config.h to ignore iconv
I removed xls2csv from the project
I had to change ssize_t to size_t in some of the source files (I don't know what might get effected by this. Your input is appreciated for it.)
Visual Studio 2015 didn't recognize ssize_t (signed size_t).
One solution is to define ssize_t in the header file as

/* ssize_t is not defined on Windows */
#ifndef ssize_t
# if defined(_WIN64)
typedef signed __int64 ssize_t;
# else
typedef signed long ssize_t;
# endif
#endif  /* !ssize_t */
/* On MSVC, ssize_t is SSIZE_T */
#ifdef _MSC_VER
#include <BaseTsd.h>
#define ssize_t SSIZE_T
#endif

Reference
https://gitlab.freedesktop.org/libnice/libnice/commit/3735b73d54f05facd36e49f3b5ee7c6fa82de9cf

I created a separate visual studio c++ console program for xls2csv by removing all linux specific code and added libxls project as its dependency.
I had to change a lot of code like sprintf -> sprintf_s, wcstombs -> wcstombs_s, etc to resolve most of the build errors.

When I was debugging the code I encountered the bug mentioned above.

This is my modified version of unicode_decode_wcstombs() method in xlstools.c source file to resolve the issue.

static char *unicode_decode_wcstombs(const char *s, size_t len, size_t *newlen) {
	// Do wcstombs conversion
	char *converted = NULL;
	errno_t err;
	size_t count, count2;
	size_t i;
	wchar_t *w;

#if defined(_WIN32) || defined(WIN32) || defined(_WIN64) || defined(WIN64) || defined(WINDOWS)
	_locale_t loc = _create_locale(LC_CTYPE, ".65001");
#else
	_locale_t loc = _create_locale(LC_CTYPE, "");
#endif

	if (loc == NULL)
	{
		printf("_create_locale failed: %d\n", errno);
		return NULL;
	}

	w = (wchar_t*)malloc((len / 2 + 1) * sizeof(wchar_t));

	for (i = 0; i < len / 2; i++)
	{
		w[i] = (BYTE)s[2 * i] + ((BYTE)s[2 * i + 1] << 8);
	}
	w[len / 2] = L'\0';

	err = _wcstombs_s_l(&count, NULL, 0, w, INT_MAX, loc);
	if (err != 0)
	{
		if (newlen) *newlen = 0;
		free(w);
		return NULL;
	}

	converted = calloc(count + 1, sizeof(char));
	err = _wcstombs_s_l(&count2, converted, count, w, count, loc);
	free(w);
	if (err != 0)
	{
		printf("_wcstombs_s_l failed (%lu)\n", (unsigned long)len / 2);
		if (newlen) *newlen = 0;
		return converted;
	}
	if (newlen) *newlen = count2;
	return converted;
}

it successfully converted utf-16 to utf-8 string.
".65001" argument in _create_locale() for WINDOWS refers to "utf-8"

Is this the correct way?

I have attached my visual studio 2015 libxls solution.
Note: I have changed ssize_t to size_t in my code. You might want to change it back to original. (Copy the original source files, resolve any build errors like I mentioned above or use _CRT_SECURE_NO_WARNINGS pre-processor)
libxls.zip

Hi, thank you very much for the contribution. To make better sense of it, it would help me if you structured your work as a pull request (or multiple pull requests). I don't have access to a Windows machine at the moment so I will be reliant on other volunteers testing your work.

Just so you know, I have a related pull request here where I am reworking aspects of charset conversion:

#77

You may want to use that as the basis of your patch.

I believe this issue is now fixed in the dev branch. If it's not fixed, please open a new issue or a pull request.