kyamagu / faiss-wheels

Unofficial faiss wheel builder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handling file paths that include non-ASCII characters such as Japanese characters

sisdanghoang opened this issue · comments

Describe the bug
When attempting to save or load indexes using FAISS's save_local and load_local functions with Japanese folder names, the operations fail. This issue occurs despite the filesystem and Python script being properly configured to handle UTF-8 encoding.

To Reproduce
Steps to reproduce the behavior:

  1. Set up a Japanese-named folder to store FAISS indexes.
  2. Use the following commands to install FAISS and run a Python script that utilizes save_local and load_local:
pip install faiss-cpu
  1. In the Python script, attempt to save an index to the Japanese-named folder using save_local.
  2. Attempt to load the index from the same folder using load_local.
  3. Observe that the operations do not complete successfully.
faiss::FileIOReader::FileIOReader(const char *) at D:\a\faiss-wheels\faiss-wheels\faiss\faiss\impl\io.cpp:68: Error: 'f' failed: could not open C:\Users\Test\db\ベクターストア\index.faiss for reading: No such file or directory

Expected behavior
The expected behavior is that FAISS should be able to save to and load from folders with Japanese names without any issues, given that the system and script are correctly set up to support UTF-8.

Desktop:

  • OS: Window
  • Architecture: x64
  • Python: 3.12.3
  • Version: v1.8.0

Additional context
This issue may be related to how FAISS handles non-ASCII characters in file paths. It is crucial for users working with non-English file systems to have full functionality with FAISS operations.

@sisdanghoang Hi, could you test if the same bug happens in the official faiss conda package? This repository is merely a thin packaging workflow and cannot handle a bugfix to the underlying faiss functionality.

@kyamagu
Hi,
Thank you for your suggestion. Unfortunately, I’m running a Windows system, and the official conda package for FAISS is not compatible with Windows. As a result, I can only install the faiss-cpu version. This repository serves as a packaging workflow, and I understand that it cannot directly address underlying FAISS functionality issues.

I’m currently facing an issue with FAISS’s save_local and load_local functions when using Japanese folder names. The operations fail to execute properly, which seems to be an encoding-related problem. If you or anyone else in the community has encountered this issue and found a solution, I would greatly appreciate any advice or suggestions.

The error is likely happening at the IO method of the upstream faiss library.
https://github.com/facebookresearch/faiss/blob/6e7d9e040f9be9734277c3f27b2cb364a67f442d/faiss/impl/io.cpp#L66

I'm not familiar with how Windows handles unicode filename, but you'd need to fix the upstream implementation.
https://learn.microsoft.com/cpp/c-runtime-library/reference/fopen-wfopen

@kyamagu , Thank you so much for your answer. I have found the solution.
I just added the following code and it can work correctly now.
import locale locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')