In work owneranalyzer, python 2 mbcs conversion may make file name messy.
wangyixiang opened this issue · comments
In the file server's file system, the files came from different region.
For Example:
The Japanese character "Middle dot" which called "katakana middle dot"
(http://graphemica.com/%E3%83%BB)
this dot in the file name, on the windows, it's UTF-16 encoded, so os.listdir(u".") will list
the file name as u"\u30fb".
u"\u30fb".encode("mbcs") -> "\x3f", my code page is 936, and it will not be able to reserve to u"\u30fb" through u"\u30fb".encode("mbcs").decode("mbcs").
If I use python2 subprocess module which not supporting Unicode to invoke check_output to invoke "getowner.exe" on such a file, "getowner.exe" will not be able to find the file which getting through python.("getowner.exe" is Unicode-based.)
To solve this problem, there're ways, I think the simplest way is to use python 3 which support unicode in the core.