RediSearch / redisearch-py

RediSearch python client

Home Page:https://redisearch.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python3.7 chinese utf-8 character error

atyu30 opened this issue · comments

code:
chinese_test.py

#!/usr/bin/env python
# encoding: utf-8

# author: Atyu30 <ipostfix (at) gmail.com>
# filename: chinese_test.py
# version: 2020-06-25 17:38
# copyrigth:  http://wiki.naershuo.com/
# description: 
# 

from redisearch.client import Client, Query
from redisearch import TextField

client = Client('idx2',host='192.168.56.105', port=6379)
try:
    client.drop_index()
except:
    pass

client.create_index([TextField('txt')])

# Add a document
client.add_document('docCn1',
                    txt='Redis支持主从同步。数据可以从主服务器向任意数量的从服务器上同步从服务器可以是关联其他从服务器的主服务器。这使得Redis可执行单层树复制。从盘可以有意无意的对数据进行写操作。由于完全实现了发布/订阅机制,使得从数据库在任何地方同步树时,可订阅一个频道并接收主服务器完整的消息发布记录。同步对读取操作的可扩展性和数据冗余很有帮助。[8]',
                    language='chinese')
title = "主从"
a = client.search(Query(title).summarize().highlight().language('chinese')).docs[0].txt
print(a)

Error:

python chinese_test.py
Traceback (most recent call last):
  File "base3.py", line 29, in <module>
    a = client.search(Query(title).summarize().highlight().language('chinese')).docs[0].txt
  File "/Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/client.py", line 365, in search
    with_scores=query._with_scores)
  File "/Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/result.py", line 42, in __init__
    ) if hascontent else {}
  File "/Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/_util.py", line 7, in to_string
    return s.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 13: invalid continuation byte
atyu30@ns198:/opt/work/redis/redisearch-py/test $vi /Users/atyu30/.pyenv/versions/3.7.0/envs/myenv/lib/python3.7/site-packages/redisearch/_util.py

Repair:

_util.py

import six

def to_string(s):
    if isinstance(s, six.string_types):
        return s
    elif isinstance(s, bytes):
        return s.decode('utf-8','ignore')
    else:
        return s  # Not a string we care about

run:

python chinese_test.py 
<b>主从</b>... 

@atyu30 Would you like to contribute a PR?
If so, please add a test :)

@atyu30 When the PR will be merged, you will be able to send 'decode_responses=False' to the client's contractor, and it will work :)