microsoft / p4vfs

Microsoft Virtual File System for Perforce

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incomplete support for filename characters on non-unicode servers

GarThor opened this issue · comments

Experienced an error syncing today on a file with an accent character [specifically é] in its name. Virtual syncs result in a file with a � character in its place. Opening the file results in errors (probably because p4vfs has trouble figuring out the actual file name to force sync).

As a workaround force syncing using the normal p4 method worked fine.

Looking through my depot, I found a few other files with this same issue, though with various other characters... Áόϋ... etc...

Looks like this might be because you are using string(char) instead of wstring(wchar) for your depot path names?

Hi GarThor,

You are certainly correct about P4VFS missing proper support for unicode characters in perforce depot file paths, and essentially unicode mode perforce servers too. While all of the P4VFS core service, driver, and command line tool is entirely 16bit wide character strings, the P4API related code (ie. DepotString) is multibyte char string and generally assumes UTF-8. I suspect there are bugs in the DepotString usage where it's assuming a Windows-1252 encoding instead of UTF-8.

There was some recent work on the test UnitTestServer.CreateDuplicatePerforceServer with the goal to also run a suite of tests on a Unicode enabled server. I think that would be the next step for flushing out issues to provide proper unicode mode server support.

Regards
jessk-msft

Does this mean the error exists in the p4 C# API and not in the ms-p4vfs tool? I haven't used the p4 C# API before, so I'm not familiar where the documentation is for that, or how it's used.

Hi GarThor, the problem is in our C++ P4VFS code. Specifically the assumption we are making with a Windows-1252 encoded DepotString type. We need to properly support that type as UTF-8 for unicode mode servers. Not doing this results in bugs with p4vfs.exe not handling filenames with some unicode characters.

Note that there are no P4VFS unit tests for filenames with unicode characters. That's the first thing I'd add before trying to fix this.

Is there a timeline for fixing this issue? It's been a while since the issue was originally posted...

Hi GarThor,
Can you confirm if your Perforce server is running in Unicode mode?

To check if a Perforce server is running in Unicode mode, you can use the p4 info command. If the server is running in Unicode mode, the output of this command will include the following line: Server charset: utf8 1.

Hi GarThor, Can you confirm if your Perforce server is running in Unicode mode?

To check if a Perforce server is running in Unicode mode, you can use the p4 info command. If the server is running in Unicode mode, the output of this command will include the following line: Server charset: utf8 1.

It looks like we don't have that configuration on our servers, however I know that we do have files in perforce with Unicode characters in their filenames.

I've reproduced this problem on a non-unicode server as well.
For example, added file:
//depot/Misc/fooé.txt

Then sync with p4vfs results in incorrect character translation:

c:\P4DemoWorkspaces\bruno_ws\Misc>p4vfs sync //depot/Misc/...
P4VFS version 1.25.1.0
Virtual Sync: ["//depot/Misc/..."] Normal Atomic @12106
Started at [07/06/2023-12:05:14] version [1.25.1.0]
8 Modification messages to act on.
//depot/Misc/foo�.txt#1 - installed as C:\P4DemoWorkspaces\bruno_ws\Misc\foo�.txt

I agree that P4VFS should made to support this, and properly support UTF8 throughout the codebase.
I can't give an exact ETA on when it'll be fixed, but I'll try to make time soon.

I've reproduced this problem on a non-unicode server as well. For example, added file: //depot/Misc/fooé.txt

Then sync with p4vfs results in incorrect character translation:

c:\P4DemoWorkspaces\bruno_ws\Misc>p4vfs sync //depot/Misc/...
P4VFS version 1.25.1.0
Virtual Sync: ["//depot/Misc/..."] Normal Atomic @12106
Started at [07/06/2023-12:05:14] version [1.25.1.0]
8 Modification messages to act on.
//depot/Misc/foo�.txt#1 - installed as C:\P4DemoWorkspaces\bruno_ws\Misc\foo�.txt

I agree that P4VFS should made to support this, and properly support UTF8 throughout the codebase. I can't give an exact ETA on when it'll be fixed, but I'll try to make time soon.

Thanks Jessk, I appreciate it... :D

Fixed in release 1.26.1.0

Fixed in release 1.26.1.0

Thanks so much Jessk-msft!
I will go ahead and update asap! :D

Hey @jessk-msft ,

I gave the latest version a try, however I still see weird output, which is prompting a little confusion. I thought it wasn't working at first, but when I look in the directory the file exists with the right characters...

image

File names after this message also include the ≤ character instead of the ó character, and then it gives a summary that looks accurate.

Original issue seems to be solved, so I can file a new issue if needed, but this seems like it could be related to the original issue.

I see the same behavior with p4.exe
My terminal encoding doesn't seem to translate the characters properly from STDOUT.

//depot/www/dev/débug.xml
//depot/www/dev/fóobar.xml

cmd.exe
image

powershell.exe running in Windows Terminal
image

Good to see that the filenames are actually correct though.

I see the same behavior with p4.exe My terminal encoding doesn't seem to translate the characters properly.

I was wondering about that, but pasted characters seem to work fine in my terminal, which seems to indicate the problem isn't with the terminal (powershell/ms terminal) itself, but rather something weird going on with the application...

I hadn't tried the regular p4 service, but I get similar results to your screenshots for p4vfs.

I'm seeing the same behavior in a powershell prompt, side-by-side with p4.exe
image

Seeing as the dir command is properly displaying the filenames, we should be able to fix P4VFS to display them correctly as well. Sounds like a separate issue though.

I'm seeing the same behavior in a powershell prompt, side-by-side with p4.exe image

Seeing as the dir command is properly displaying the filenames, we should be able to fix P4VFS to display them correctly as well. Sounds like a separate issue though.

Sounds good, lmk if you want me to create a separate issue for it.

Sure I think that this console output is worth fixing. If p4vfs.exe is running with a console attached, then it should be required to display output using the console's codepage. @GarThor, go ahead and create a P4VFS issue for it :)

Here's another example showing how my default ANSI codepage for cmd.exe is CP-437.
Both p4.exe and p4vfs.exe expect Windows-1252 CP-1252

We can use chcp.com to get/set the current console codepage:

image

Sure I think that this console output is worth fixing. If p4vfs.exe is running with a console attached, then it should be required to display output using the console's codepage. @GarThor, go ahead and create a P4VFS issue for it :)

Here's another example showing how my default ANSI codepage for cmd.exe is CP-437. Both p4.exe and p4vfs.exe expect Windows-1252 CP-1252

We can use chcp.com to get/set the current console codepage:

image

Ok, posted new thread: #27

Closed. Follow up with #27