denisenkom / pytds

Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Passing raw bytes to sql server

VFLashM opened this issue · comments

Legacy part of my project pipes raw bytes into sql varchars.
My code is written in python2 and I need to ensure interoperability with that code.

pytds has bytes_to_unicode option, but setting it to false does not work as expected.

Example:

con = pytds.connect(<credentials here>, bytes_to_unicode=False)
c = con.cursor()
arg = 'weird chars \xee\xb4\xba'
c.execute('''
set nocount on
declare @a varchar(max) = %s
select @a
''', [arg])
row = c.fetchone()
print repr(arg)
print repr(row[0])

With bytes_to_unicode=True (default) I get following result:

'weird chars \xee\xb4\xba'
u'weird chars ?'

Setting bytes_to_unicode=False fails with exception:

Traceback (most recent call last):
  File "./test.py", line 17, in <module>
    ''', [arg])
  File "pytds/__init__.py", line 634, in execute
    self._execute(operation, params)
  File "pytds/__init__.py", line 615, in _execute
    self._exec_with_retry(lambda: self._session.submit_rpc(
  File "pytds/__init__.py", line 563, in _exec_with_retry
    return fun()
  File "pytds/__init__.py", line 618, in <lambda>
    0))
  File "pytds/tds.py", line 3273, in submit_rpc
    param.type.write(w, param.value)
  File "pytds/tds.py", line 1434, in write
    val, _ = self._codec.encode(val)
  File "/opt/python-2.7env/lib/python2.7/encodings/cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 12: ordinal not in range(128)

Current implementation of bytes_to_unicode seems to be rather useless.
I'd say it should not try to encode input and should pass it as it is to the database.
I propose one simple change to achieve that:
In tds.py, in method VarCharMax.write

-            val = force_unicode(val)
-            val, _ = self._codec.encode(val)
+            if isinstance(val, unicode):
+                val, _ = self._codec.encode(val)

Result after the change:

'weird chars \xee\xb4\xba'
u'weird chars \xee\xb4\xba'

Still not perfect (I'd prefer it to return non-unicode string), but at least it works.

I'm not sure if this change affects anything else (I'm having troubles running test suite).
Please let me know what you think.

I'm having troubles running test suite, but still can make a pull request if necessary.

Okay, I'm apparently using very old version of pytds.

Looking at current source code corresponding changes would be:

  1. pass bytes_to_unicode to varchar serializers (it can be extracted from reader/writer._session._login, but that's ugly)
  2. on write only force unicode if bytes_to_unicode is true
  3. only encode string being written if it's unicode
  4. on read decode only if bytes_to_unicode is true

These changes will affect VarChar70Serializer, VarChar72Serializer and Text70Serializer.

Does this change make sense to you?
Do you want it in your code?
As I mentioned before, I have severe administrative problems running your test suite.
On top of that I'm not quite sure how to pass bytes_to_unicode to serializers properly.

I still can try to implement it, though.

Sounds reasonable to me. Is there anything I can help with running test suite?

I'll try to make it work sometime next week.

What would be the best way to access bytes_to_unicode from serializers?
As of now I just get it through reader/writer->connection->login, it doesn't look very nice.
Should I pass it through constructor (and all factories)?

Maybe we need some kind of a Context object which would contain things like that, but that would probably be too much of a refactoring, so I am ok with using reader/writer->connection->login for now, I think this is how other serializers do that currently anyway.