to_json(std::filesystem::path) can create invalid UTF-8 chars on windows
MHebes opened this issue · comments
Description
This conversion function:
template<typename BasicJsonType>
inline void to_json(BasicJsonType& j, const std_fs::path& p)
{
j = p.string();
}
uses p.string()
, which does not give a UTF-8-encoded string on windows (in some cases, maybe?). Trying to dump()
the resultant JSON throws a "invalid UTF-8 byte" exception.
Reproduction steps
Convert a std::filesystem::path
, which contains a unicode "Right Single Quotation Mark" character (U+2019), to a json
implicitly or with to_json
.
Inspect the new json (string_t)
's bytes, either by dump()
ing, or converting to BSON.
Expected vs. actual results
Expected: "Strings are stored in UTF-8 encoding." per https://json.nlohmann.me/api/basic_json/string_t/
Actual: The string gets converted by std::filesystem::path::string()
, which appears to convert it to Windows-1252 encoding. Its bytes end up as \x92
rather than \xe2\x80\x99
.
Minimal code example
#include <filesystem>
#include <iostream>
#include <nlohmann/json.hpp>
int main() {
try {
wchar_t wide_unicode_right_quote[2] = {0x2019, 0}; // came from a directory_iterator in reality
nlohmann::json apost = std::filesystem::path(wide_unicode_right_quote);
std::cout << apost << std::endl;
return 0;
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
return 1;
}
}
Workaround I'm using is to use WideCharToMultiByte
+ .native()
to get the string in UTF-8 before passing to nlohmann:
inline std::string Narrow(std::wstring_view wstr) {
if (wstr.empty()) return {};
int len = ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), nullptr, 0, nullptr, nullptr);
std::string out(len, 0);
::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), &out[0], len, nullptr, nullptr);
return out;
}
int main() {
try {
wchar_t wide_unicode_right_quote[2] = {0x2019, 0}; // came from a directory_iterator in reality
nlohmann::json apost = Narrow(std::filesystem::path(wide_unicode_right_quote).native());
std::cout << apost << std::endl;
return 0;
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
return 1;
}
}
Error messages
"[json.exception.type_error.316] invalid UTF-8 byte at index 0: 0x92
Compiler and operating system
MSVC 2022 Professional, C++ 20
Library version
develop - a259ecc
Validation
- The bug also occurs if the latest version from the
develop
branch is used. - I can successfully compile and run the unit tests.
I can also workaround this problem by adding a manifest XML that sets my app's code page to CP_UTF8
on supported versions of windows.
In CMake I wrapped this in a function:
# target_add_manifest(<target> <manifest file>)
#
# You probably want to use ${MANIFEST_FILE_UTF8} defined below this function
#
# Adds a manifest file (https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests)
# to an EXE
function(target_add_manifest TARGET_NAME MANIFEST_FILE)
if(NOT TARGET_NAME)
message(FATAL_ERROR "You must provide a target")
endif()
if(NOT MANIFEST_FILE)
message(FATAL_ERROR "You must provide a manifest file")
endif()
add_custom_command(
TARGET ${TARGET_NAME}
POST_BUILD
COMMAND "mt.exe" -manifest \"${MANIFEST_FILE}\" \"-updateresource:$<TARGET_FILE:${TARGET_NAME}>\"
)
endfunction()
which is used like this (probably want to wrap in a platform check):
add_executable(myapp main.cpp)
target_add_manifest(myapp "${CMAKE_CURRENT_SOURCE_DIR}/cmake/utf8.manifest")
with utf8.manifest
being:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
This solves the problem, if the app is running on at least Windows Version 1903. Still a bug but wanted to share this workaround because it's useful for many libraries that have the same issue.
Proposed diff to do the conversion to UTF-8 when targeting windows:
diff --git a/include/nlohmann/detail/conversions/to_json.hpp b/include/nlohmann/detail/conversions/to_json.hpp
index 562089c3..a8b74688 100644
--- a/include/nlohmann/detail/conversions/to_json.hpp
+++ b/include/nlohmann/detail/conversions/to_json.hpp
@@ -413,10 +413,20 @@ inline void to_json(BasicJsonType& j, const T& t)
}
#if JSON_HAS_FILESYSTEM || JSON_HAS_EXPERIMENTAL_FILESYSTEM
+#if defined(_WIN32)
+#include <windows.h>
+#endif
template<typename BasicJsonType>
inline void to_json(BasicJsonType& j, const std_fs::path& p)
{
+#if defined(_WIN32)
+ int len = ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), nullptr, 0, nullptr, nullptr);
+ std::string as_utf8(len, 0);
+ ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), &narrowed_string[0], len, nullptr, nullptr);
+ j = std::move(as_utf8);
+#else
j = p.string();
+#endif
}
#endif