wangyixiang / SomeWorks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In work owneranalyzer, python 2 mbcs conversion may make file name messy.

wangyixiang opened this issue · comments

In the file server's file system, the files came from different region.
For Example:
The Japanese character "Middle dot" which called "katakana middle dot"
(http://graphemica.com/%E3%83%BB)

this dot in the file name, on the windows, it's UTF-16 encoded, so os.listdir(u".") will list
the file name as u"\u30fb".
u"\u30fb".encode("mbcs") -> "\x3f", my code page is 936, and it will not be able to reserve to u"\u30fb" through u"\u30fb".encode("mbcs").decode("mbcs").

If I use python2 subprocess module which not supporting Unicode to invoke check_output to invoke "getowner.exe" on such a file, "getowner.exe" will not be able to find the file which getting through python.("getowner.exe" is Unicode-based.)

To solve this problem, there're ways, I think the simplest way is to use python 3 which support unicode in the core.