ifduyue / python-xxhash

Python Binding for xxHash

Home Page:https://pypi.org/project/xxhash/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CFFI can't hash unicode strings

bra-fsn opened this issue · comments

This works in the normal version:

xxh64(u'test')
<xxhash.xxh64 object at 0x80073e880>

and fails in the CFFI one:

xxh64(u'test')
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/pypy-5.6/site-packages/xxhash/cffi.py", line 59, in init
self.update(input)
File "/usr/local/pypy-5.6/site-packages/xxhash/cffi.py", line 65, in update
lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a str or list or tuple, not unicode

Python 2.7.13 CFFI:

>>> __import__('xxhash').xxh64(u'test').hexdigest()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxhash/cffi.py", line 59, in __init__
    self.update(input)
  File "xxhash/cffi.py", line 65, in update
    lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a cdata pointer, not unicode

Python 3.6.0 CFFI:

>>> __import__('xxhash').xxh64(u'test').hexdigest()
b'4fdcca5ddb678139'

PyPy 5.6.0 CFFI

>>>> __import__('xxhash').xxh64(u'test').hexdigest()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxhash/cffi.py", line 59, in __init__
    self.update(input)
  File "xxhash/cffi.py", line 65, in update
    lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a str or list or tuple, not unicode

Python 2.7.13 CPython

>>> __import__('xxhash').xxh64(u'test').hexdigest()
'4fdcca5ddb678139'

Python 3.6.0 CPython

>>> __import__('xxhash').xxh64(u'test').hexdigest()
'4fdcca5ddb678139'

PyPy 5.6.0 CPython

>>>> __import__('xxhash').xxh64(u'test').hexdigest()
'4fdcca5ddb678139'

And let's try some unicode strings that cannot be encoded into byte strings using ASCII codec

Python 2.7.13 CFFI

>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxhash/cffi.py", line 59, in __init__
    self.update(input)
  File "xxhash/cffi.py", line 65, in update
    lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a cdata pointer, not unicode

Python 3.6.0 CFFI

>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
b'240b38071b1a85de'

PyPy 5.6.0 CFFI

>>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxhash/cffi.py", line 59, in __init__
    self.update(input)
  File "xxhash/cffi.py", line 65, in update
    lib.XXH64_update(self.xxhash_state, input, len(input))
TypeError: initializer for ctype 'void *' must be a str or list or tuple, not unicode

Python 2.7.13 CPython

>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

Python 3.6.0 CPython

>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
'90ccdd496c644f03'

PyPy 5.6.0 CPython

>>>> __import__('xxhash').xxh64(u'你好,世界').hexdigest()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

It's a little complicated and it will take some effort.
The simplest way is rejecting unicode in all cases.