Passing raw bytes to sql server
VFLashM opened this issue · comments
Legacy part of my project pipes raw bytes into sql varchars.
My code is written in python2 and I need to ensure interoperability with that code.
pytds has bytes_to_unicode
option, but setting it to false does not work as expected.
Example:
con = pytds.connect(<credentials here>, bytes_to_unicode=False)
c = con.cursor()
arg = 'weird chars \xee\xb4\xba'
c.execute('''
set nocount on
declare @a varchar(max) = %s
select @a
''', [arg])
row = c.fetchone()
print repr(arg)
print repr(row[0])
With bytes_to_unicode=True
(default) I get following result:
'weird chars \xee\xb4\xba'
u'weird chars ?'
Setting bytes_to_unicode=False
fails with exception:
Traceback (most recent call last):
File "./test.py", line 17, in <module>
''', [arg])
File "pytds/__init__.py", line 634, in execute
self._execute(operation, params)
File "pytds/__init__.py", line 615, in _execute
self._exec_with_retry(lambda: self._session.submit_rpc(
File "pytds/__init__.py", line 563, in _exec_with_retry
return fun()
File "pytds/__init__.py", line 618, in <lambda>
0))
File "pytds/tds.py", line 3273, in submit_rpc
param.type.write(w, param.value)
File "pytds/tds.py", line 1434, in write
val, _ = self._codec.encode(val)
File "/opt/python-2.7env/lib/python2.7/encodings/cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 12: ordinal not in range(128)
Current implementation of bytes_to_unicode
seems to be rather useless.
I'd say it should not try to encode input and should pass it as it is to the database.
I propose one simple change to achieve that:
In tds.py
, in method VarCharMax.write
- val = force_unicode(val)
- val, _ = self._codec.encode(val)
+ if isinstance(val, unicode):
+ val, _ = self._codec.encode(val)
Result after the change:
'weird chars \xee\xb4\xba'
u'weird chars \xee\xb4\xba'
Still not perfect (I'd prefer it to return non-unicode string), but at least it works.
I'm not sure if this change affects anything else (I'm having troubles running test suite).
Please let me know what you think.
I'm having troubles running test suite, but still can make a pull request if necessary.
Okay, I'm apparently using very old version of pytds.
Looking at current source code corresponding changes would be:
- pass
bytes_to_unicode
to varchar serializers (it can be extracted fromreader/writer._session._login
, but that's ugly) - on write only force unicode if
bytes_to_unicode
is true - only encode string being written if it's unicode
- on read decode only if
bytes_to_unicode
is true
These changes will affect VarChar70Serializer
, VarChar72Serializer
and Text70Serializer
.
Does this change make sense to you?
Do you want it in your code?
As I mentioned before, I have severe administrative problems running your test suite.
On top of that I'm not quite sure how to pass bytes_to_unicode
to serializers properly.
I still can try to implement it, though.
Sounds reasonable to me. Is there anything I can help with running test suite?
I'll try to make it work sometime next week.
What would be the best way to access bytes_to_unicode
from serializers?
As of now I just get it through reader/writer->connection->login, it doesn't look very nice.
Should I pass it through constructor (and all factories)?
Maybe we need some kind of a Context object which would contain things like that, but that would probably be too much of a refactoring, so I am ok with using reader/writer->connection->login for now, I think this is how other serializers do that currently anyway.