main_bot struggles if you have non-ascii characters in your name
loeiten opened this issue · comments
If you name contains any funny characters in Telegram, the bot will crash
Ready to talk!
An update received.
Traceback (most recent call last):
File "main_bot.py", line 111, in <module>
main()
File "main_bot.py", line 103, in main
print("Update content: {}".format(update))
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 153: ordinal not in range(128)
Although adding some more computational complexity, adding the following function
def cast_to_utf_8(old_dict):
"""
Encodes the string content of a dict to utf-8
Parameters
----------
old_dict : dict
The dict to encode
Returns
-------
new_dict : dict
The encoded dict
"""
def walk(node):
"""
Recursively traverses a node ande encodes all strings to utf-8
Parameters
----------
node : dict
The node to traverse
Returns
-------
node : dict
The node where the strings are encoded to utf-8
"""
for key, item in node.items():
if type(item)==dict:
walk(item)
elif type(item)==list:
for i, elem in enumerate(item):
if type(elem) == str:
node[key][i] = elem.encode('utf-8')
elif type(item)==str:
node[key] = item.encode('utf-8')
return node
new_dict = walk(old_dict)
return new_dict
and calling it like this in main()
if is_unicode(text):
update = cast_to_utf_8(update)
print("Update content: {}".format(update))
bot.send_message(chat_id, bot.get_answer(update["message"]["text"]))
else:
bot.send_message(chat_id, "Hmm, you are sending some weird characters to me...")
was a remedy for me
Hello loeiten,
It looks like your terminal does not support outputting Unicode characters. Have you tried setting the locale that supports Unicode? E.g.:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
See this article for more details.
We can probably detect such situations in the code, though it feels a bit opinionated - some users might want to use Unicode characters.
Whatever the final solution, we will reflect this in the docs for the assignment.
Thanks @akashin, I was not aware of this. Your suggestion worked :).
Indeed, maybe writing it in the docs would be best if it affects several people.
If not, a simple
import sys
if not 'UTF-8' in sys.stdout.encoding:
# Suggest exporting UTF-8
would also do