Passing raw bytes to sql server

Question

Passing raw bytes to sql server

VFLashM opened this issue 7 years ago · comments

Legacy part of my project pipes raw bytes into sql varchars.
My code is written in python2 and I need to ensure interoperability with that code.

pytds has bytes_to_unicode option, but setting it to false does not work as expected.

Example:

con = pytds.connect(<credentials here>, bytes_to_unicode=False)
c = con.cursor()
arg = 'weird chars \xee\xb4\xba'
c.execute('''
set nocount on
declare @a varchar(max) = %s
select @a
''', [arg])
row = c.fetchone()
print repr(arg)
print repr(row[0])

With bytes_to_unicode=True (default) I get following result:

'weird chars \xee\xb4\xba'
u'weird chars ?'

Setting bytes_to_unicode=False fails with exception:

Traceback (most recent call last):
  File "./test.py", line 17, in <module>
    ''', [arg])
  File "pytds/__init__.py", line 634, in execute
    self._execute(operation, params)
  File "pytds/__init__.py", line 615, in _execute
    self._exec_with_retry(lambda: self._session.submit_rpc(
  File "pytds/__init__.py", line 563, in _exec_with_retry
    return fun()
  File "pytds/__init__.py", line 618, in <lambda>
    0))
  File "pytds/tds.py", line 3273, in submit_rpc
    param.type.write(w, param.value)
  File "pytds/tds.py", line 1434, in write
    val, _ = self._codec.encode(val)
  File "/opt/python-2.7env/lib/python2.7/encodings/cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 12: ordinal not in range(128)

Current implementation of bytes_to_unicode seems to be rather useless.
I'd say it should not try to encode input and should pass it as it is to the database.
I propose one simple change to achieve that:
In tds.py, in method VarCharMax.write

-            val = force_unicode(val)
-            val, _ = self._codec.encode(val)
+            if isinstance(val, unicode):
+                val, _ = self._codec.encode(val)

Result after the change:

'weird chars \xee\xb4\xba'
u'weird chars \xee\xb4\xba'

Still not perfect (I'd prefer it to return non-unicode string), but at least it works.

I'm not sure if this change affects anything else (I'm having troubles running test suite).
Please let me know what you think.

I'm having troubles running test suite, but still can make a pull request if necessary.

Valerii Lashmanov · Answer 1 · Wed May 31 2017 04:46:23 GMT+0800 (China Standard Time)

Okay, I'm apparently using very old version of pytds.

Looking at current source code corresponding changes would be:

pass bytes_to_unicode to varchar serializers (it can be extracted from reader/writer._session._login, but that's ugly)
on write only force unicode if bytes_to_unicode is true
only encode string being written if it's unicode
on read decode only if bytes_to_unicode is true

These changes will affect VarChar70Serializer, VarChar72Serializer and Text70Serializer.

Does this change make sense to you?
Do you want it in your code?
As I mentioned before, I have severe administrative problems running your test suite.
On top of that I'm not quite sure how to pass bytes_to_unicode to serializers properly.

I still can try to implement it, though.

denisenkom · Answer 2 · Mon Jun 05 2017 05:17:39 GMT+0800 (China Standard Time)

Sounds reasonable to me. Is there anything I can help with running test suite?

Valerii Lashmanov · Answer 3 · Thu Jun 08 2017 05:49:34 GMT+0800 (China Standard Time)

I'll try to make it work sometime next week.

What would be the best way to access bytes_to_unicode from serializers?
As of now I just get it through reader/writer->connection->login, it doesn't look very nice.
Should I pass it through constructor (and all factories)?

denisenkom · Answer 4 · Fri Jun 09 2017 09:31:31 GMT+0800 (China Standard Time)

Maybe we need some kind of a Context object which would contain things like that, but that would probably be too much of a refactoring, so I am ok with using reader/writer->connection->login for now, I think this is how other serializers do that currently anyway.