yara-python cannot scan chinese filename

Question

yara-python cannot scan chinese filename

qux-bbb opened this issue a year ago · comments

Just like this link says, yara-python cannot also scan chinese filename:
VirusTotal/yara#1487

The script:

import yara

yara_rule_path = 'hello.yar'
rules = yara.compile(filepath=yara_rule_path)

sample_path = '你好.txt'
matches = rules.match(sample_path)
print(sample_path, matches)

The error:

Traceback (most recent call last):
  File "d:/recent/tmp/test.py", line 7, in <module>
    matches = rules.match(sample_path)
yara.Error: could not open file "你好.txt"

Victor M. Alvarez · Answer 1 · Thu Nov 30 2023 19:11:45 GMT+0800 (China Standard Time)

This issue is similar to #245.

This is a well known issue that won't be solved anytime soon. But there's an alternative, instead of passing filepath to rules.match, read the file from Python and pass data to rules.match. This way Python handles the file reading, which should handle unicode path correctly. The problem with passing the file path directly to YARA is that YARA's API doesn't offer unicode support.