python3.7 chinese utf-8 character error
atyu30 opened this issue · comments
code:
chinese_test.py
#!/usr/bin/env python
# encoding: utf-8
# author: Atyu30 <ipostfix (at) gmail.com>
# filename: chinese_test.py
# version: 2020-06-25 17:38
# copyrigth: http://wiki.naershuo.com/
# description:
#
from redisearch.client import Client, Query
from redisearch import TextField
client = Client('idx2',host='192.168.56.105', port=6379)
try:
client.drop_index()
except:
pass
client.create_index([TextField('txt')])
# Add a document
client.add_document('docCn1',
txt='Redis支持主从同步。数据可以从主服务器向任意数量的从服务器上同步从服务器可以是关联其他从服务器的主服务器。这使得Redis可执行单层树复制。从盘可以有意无意的对数据进行写操作。由于完全实现了发布/订阅机制,使得从数据库在任何地方同步树时,可订阅一个频道并接收主服务器完整的消息发布记录。同步对读取操作的可扩展性和数据冗余很有帮助。[8]',
language='chinese')
title = "主从"
a = client.search(Query(title).summarize().highlight().language('chinese')).docs[0].txt
print(a)
Error:
python chinese_test.py
Traceback (most recent call last):
File "base3.py", line 29, in <module>
a = client.search(Query(title).summarize().highlight().language('chinese')).docs[0].txt
File "/Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/client.py", line 365, in search
with_scores=query._with_scores)
File "/Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/result.py", line 42, in __init__
) if hascontent else {}
File "/Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/_util.py", line 7, in to_string
return s.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 13: invalid continuation byte
atyu30@ns198:/opt/work/redis/redisearch-py/test $vi /Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/_util.py
Repair:
_util.py
import six
def to_string(s):
if isinstance(s, six.string_types):
return s
elif isinstance(s, bytes):
return s.decode('utf-8','ignore')
else:
return s # Not a string we care about
run:
python chinese_test.py
<b>主从</b>...
@atyu30 Would you like to contribute a PR?
If so, please add a test :)
@atyu30 When the PR will be merged, you will be able to send 'decode_responses=False' to the client's contractor, and it will work :)