CFFI can't hash unicode strings
bra-fsn opened this issue · comments
This works in the normal version:
xxh64(u'test')
<xxhash.xxh64 object at 0x80073e880>
and fails in the CFFI one:
xxh64(u'test')
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/pypy-5.6/site-packages/xxhash/cffi.py", line 59, in init
self.update(input)
File "/usr/local/pypy-5.6/site-packages/xxhash/cffi.py", line 65, in update
lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a str or list or tuple, not unicode
Python 2.7.13 CFFI:
>>> __import__('xxhash').xxh64(u'test').hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xxhash/cffi.py", line 59, in __init__
self.update(input)
File "xxhash/cffi.py", line 65, in update
lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a cdata pointer, not unicode
Python 3.6.0 CFFI:
>>> __import__('xxhash').xxh64(u'test').hexdigest()
b'4fdcca5ddb678139'
PyPy 5.6.0 CFFI
>>>> __import__('xxhash').xxh64(u'test').hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xxhash/cffi.py", line 59, in __init__
self.update(input)
File "xxhash/cffi.py", line 65, in update
lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a str or list or tuple, not unicode
Python 2.7.13 CPython
>>> __import__('xxhash').xxh64(u'test').hexdigest()
'4fdcca5ddb678139'
Python 3.6.0 CPython
>>> __import__('xxhash').xxh64(u'test').hexdigest()
'4fdcca5ddb678139'
PyPy 5.6.0 CPython
>>>> __import__('xxhash').xxh64(u'test').hexdigest()
'4fdcca5ddb678139'
And let's try some unicode strings that cannot be encoded into byte strings using ASCII codec
Python 2.7.13 CFFI
>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xxhash/cffi.py", line 59, in __init__
self.update(input)
File "xxhash/cffi.py", line 65, in update
lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a cdata pointer, not unicode
Python 3.6.0 CFFI
>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
b'240b38071b1a85de'
PyPy 5.6.0 CFFI
>>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xxhash/cffi.py", line 59, in __init__
self.update(input)
File "xxhash/cffi.py", line 65, in update
lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a str or list or tuple, not unicode
Python 2.7.13 CPython
>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
Python 3.6.0 CPython
>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
'90ccdd496c644f03'
PyPy 5.6.0 CPython
>>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
It's a little complicated and it will take some effort.
The simplest way is rejecting unicode in all cases.