ctypesgen / ctypesgen

Pure-python wrapper generator for ctypes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

String seems incorrect

Alan-R opened this issue · comments

I ran into some problems, and need help with them (version 1.0.1-1.0.2 on python 3.7.4):

  File "/home/alanr/monitor/src/cma/AssimCtypes.py", line 1323, in <module>
    proj_class_new.argtypes = [gsize, String]
TypeError: item 2 in _argtypes_ passes a union by value, which is unsupported

The C source in question looks like this:

gpointer
proj_class_new(gsize objsize, ///< Size of object to be allocated
       const char * static_classname) ///< Static string giving name of class

The corresponding header file looks like this:
WINEXPORT gpointer      proj_class_new(gsize objsize, const char * static_classname);

The second argument is clearly a pointer, not a call by value - of anything.
The generated Python looks like this:

1320 # /home/alanr/monitor/src/include/proj_classes.h: 28
1321 if hasattr(_libs['libassimilationclientlib.so'], 'proj_class_new'):
1322     proj_class_new = _libs['libassimilationclientlib.so'].proj_class_new
1323     proj_class_new.argtypes = [gsize, String]
1324     proj_class_new.restype = gpointer

String is a generated class, which looks like this:

339 class String(MutableString, Union):
340 
341     _fields_ = [("raw", POINTER(c_char)), ("data", c_char_p)]
342 
343     def __init__(self, obj=""):
344         if isinstance(obj, (bytes, UserString)):
345             self.data = bytes(obj)
346         else:
347             self.raw = obj

As an experiment, I removed the Union from the String definition, and  then it seemed to work a bit better...

A few milliseconds later, I got this error:
  File "/home/alanr/monitor/src/cma/AssimCtypes.py", line 1028, in
    ('priv', POINTER(GSourcePrivate)),
TypeError: second item in fields tuple (index 11) must be a C type

The code in question looks like this:

1015 struct__GSource._fields_ = [
1016     ('callback_data', gpointer),
1017     ('callback_funcs', POINTER(GSourceCallbackFuncs)),
1018     ('source_funcs', POINTER(GSourceFuncs)),
1019     ('ref_count', guint),
1020     ('context', POINTER(GMainContext)),
1021     ('priority', gint),
1022     ('flags', guint),
1023     ('source_id', guint),
1024     ('poll_fds', POINTER(GSList)),
1025     ('prev', POINTER(GSource)),
1026     ('next', POINTER(GSource)),
1027     ('name', String),
1028     ('priv', POINTER(GSourcePrivate)),
1029 ]

It appears that String is not a legal field type. So, this is a problem...

  -- Alan

And, I really am confused here, since it looks like these definitions have been around quite a while...

I want to blame me - but can't see anything I changed on my end... I'll try and create a smaller test case...

OK. I ran the tests. They all pass on 2.7.17 and 3.6.8, but 6 of them fail on Python 3.7.4. The failed tests are:

ctypesgen/test/testsuite.py::StdlibTest::test_getenv_returns_null FAILED [  2%]
ctypesgen/test/testsuite.py::StdlibTest::test_getenv_returns_string FAILED [  4%]
ctypesgen/test/testsuite.py::MathTest::test_bad_args_string_not_number FAILED [ 74%]
ctypesgen/test/testsuite.py::MathTest::test_sin FAILED                   [ 76%]
ctypesgen/test/testsuite.py::MathTest::test_sqrt FAILED                  [ 78%]
ctypesgen/test/testsuite.py::MathTest::test_subcall_sin FAILED           [ 80%]

Curiously, the test I added while attempting to reproduce the problem works so far (but it's overly simplistic). There is a 3.7.6 out there, and of course, there's 3.8 to test against. I'll see if I can try those.

OK. It fails on 3.8.1 just like on 3.7.6.

OK. It looks like the other tests fail for similar reasons as my complaint:
E TypeError: item 1 in argtypes passes a union by value, which is unsupported.

All 6 fail for the same reason.

I'll try and figure out why the code passed our CI processes. My guess is that it wasn't running 3.7 or 3.8.

OK. It appears that this worked with 3.7.1, but fails with 3.7.4 and beyond...

OK... In Travis, it seems to work with Python 3.7.5. I appear to have been testing with 3.7.6.

There appear to be a number ctypes fixes recently. Not sure where the problem was introduced...

It passes on 3.6.9 also.

The particular piece of code that is causing the exception is here:

        if (stgdict != NULL) {
            if (stgdict->flags & TYPEFLAG_HASUNION) {
                Py_DECREF(converters);
                Py_DECREF(ob);
                if (!PyErr_Occurred()) {
                    PyErr_Format(PyExc_TypeError,
                                 "item %zd in _argtypes_ passes a union by "
                                 "value, which is unsupported.",
                                 i + 1);
                }
                return NULL;
            }

This is in the function
static PyObject *converters_from_argtypes(PyObject *ob) which can be found here: https://github.com/python/cpython/blob/master/Modules/_ctypes/_ctypes.c

And here's where that behavior/bug was introduced:
python/cpython@79d4ed1#diff-998bfefaefe2ab83d5f523e18f158fa4

OK. This appears to have been introduced with this change:
python/cpython@79d4ed1#diff-998bfefaefe2ab83d5f523e18f158fa4

A very deliberate change. I'm guessing that ctypesgen is generating technically incorrect object descriptions.

Python 3.6 seems to work fine too. So, it is broken for all versions of Python >= 3.7.6.
I can't easily downgrade my version of python 3.7 and keep it downgraded, but I can use Python 3.6.9, and it should be good for a few more months. Although I use "f-strings" - they came in to Python 3.6. Guess I'll find out if I used any 3.7 features :-)

Yes, this was a deliberate change to fix long-standing Python issue bpo-16575. If you think there is a bug in the logic of the fix, feel free to open a new Python issue with a minimal script demonstrating the bug.

I understood that. I'm not sure why the ctypesgen code generates String references the way it does, but it definitely broke every ctypesgen-generated string reference.

@vsajip I'll try and create a small example that reproduces the problem. Then you can tell me if the example is wrong, or the patch could be improved. But I have a workaround for the Assimilation project - now that I know what the issue is, and I have an Assimilation release to get out in the next few days.

I have actually wondered about this bit of string handling done by ctypesgen. I am not entirely sure if this special handling is currently necessary or if the original purpose it served is warranted.

Taking a closer look at this, it seems that, as an argument, the "String" class at best helps to automatically convert various things into a byte array. I personally think that the bytes type should be required to be used at the c-interface when a "const char *" parameter is required -- let ctypes handle the automatic conversion to the c_char_p; there is no real sense in trying to reimplement the automatic type conversion already happening in ctypes. Other auto-conversion I think are misplaced, especially in light of the very specific differences between str and bytes.
(It is a one-line change to move to my suggested pattern)

As a return value, it appears that the intent was to allow non-const char* types to be returned as mutable strings. Playing around with this---it does not seem that this actually works correctly. Replacing this with a simpler ctypes.Union appears to work. Since I don't currently have >py36, I'd appreciate someone else trying this for me for newer python versions to see what happens:

import ctypes, ctypes.util                                                       
                                                                                 
class Ustr(ctypes.Union):                                                        
    _fields_ = [("raw", ctypes.POINTER(ctypes.c_char)), ("data", ctypes.c_char_p)]
    def __bytes__(self):                                                         
        return self.data                                                         
    def __str__(self):                                                           
        return self.data.decode()                                                
    def __repr__(self):                                                          
        return repr(self.data)                                                   
    def __getitem__(self, index):                                                
        if isinstance(index, int):                                               
            return chr(self.data[index]).encode()                                
        return self.data[index]                                                  
    def __setitem__(self, index, sub):                                           
        if isinstance(index, int):                                               
            self.raw[index] = sub                                                
            return                                                               
        assert (index.stop - index.start) == len(sub), 'bytes array size mismatch'
        for i,c in zip(range(index.start, index.stop, index.step if index.step else 1), sub):
            self.raw[i] = c                                                      
                                                                                 
clib = ctypes.cdll.LoadLibrary(ctypes.util.find_library('c'))                    
clib.getenv.argtypes = [ctypes.c_char_p]                                         
clib.getenv.restype = Ustr                                                       
                                                                                 
# now try changing the mutable environmental string:                             
orig_home = clib.getenv(b'HOME')                                                 
                                                                                 
print('HOME was originally: ', orig_home)                                        
orig_home[2] = b'*'                                                              
print('Modified HOME variable to: ', orig_home)                                  
print('Fresh from environment: ', clib.getenv(b'HOME'))                          
                                                                                 
if bytes(orig_home) == bytes(clib.getenv(b'HOME')):                              
    print('SUCCESS:  was able to change mutable string')                                   
else:                                                                            
    print('FAIL!!!:  could not change supposedly mutable string')

@olsonse: If you're running Linux, it's easy to get a variety of Python versions available. If you send me a separate email, I can help you with that. It seems like it would be handy in order to be able to run more test environments both inside and outside of Tox.

I have actually wondered about this bit of string handling done by ctypesgen. I am not entirely sure if this special handling is currently necessary or if the original purpose it served is warranted.

Taking a closer look at this, it seems that, as an argument, the "String" class at best helps to automatically convert various things into a byte array. I personally think that the bytes type should be required to be used at the c-interface when a "const char *" parameter is required -- let ctypes handle the automatic conversion to the c_char_p; there is no real sense in trying to reimplement the automatic type conversion already happening in ctypes. Other auto-conversion I think are misplaced, especially in light of the very specific differences between str and bytes.
(It is a one-line change to move to my suggested pattern)

As a return value, it appears that the intent was to allow non-const char* types to be returned as mutable strings. Playing around with this---it does not seem that this actually works correctly. Replacing this with a simpler ctypes.Union appears to work. Since I don't currently have >py36, I'd appreciate someone else trying this for me for newer python versions to see what happens:

import ctypes, ctypes.util                                                       
                                                                                 
class Ustr(ctypes.Union):                                                        
    _fields_ = [("raw", ctypes.POINTER(ctypes.c_char)), ("data", ctypes.c_char_p)]
    def __bytes__(self):                                                         
        return self.data                                                         
    def __str__(self):                                                           
        return self.data.decode()                                                
    def __repr__(self):                                                          
        return repr(self.data)                                                   
    def __getitem__(self, index):                                                
        if isinstance(index, int):                                               
            return chr(self.data[index]).encode()                                
        return self.data[index]                                                  
    def __setitem__(self, index, sub):                                           
        if isinstance(index, int):                                               
            self.raw[index] = sub                                                
            return                                                               
        assert (index.stop - index.start) == len(sub), 'bytes array size mismatch'
        for i,c in zip(range(index.start, index.stop, index.step if index.step else 1), sub):
            self.raw[i] = c                                                      
                                                                                 
clib = ctypes.cdll.LoadLibrary(ctypes.util.find_library('c'))                    
clib.getenv.argtypes = [ctypes.c_char_p]                                         
clib.getenv.restype = Ustr                                                       
                                                                                 
# now try changing the mutable environmental string:                             
orig_home = clib.getenv(b'HOME')                                                 
                                                                                 
print('HOME was originally: ', orig_home)                                        
orig_home[2] = b'*'                                                              
print('Modified HOME variable to: ', orig_home)                                  
print('Fresh from environment: ', clib.getenv(b'HOME'))                          
                                                                                 
if bytes(orig_home) == bytes(clib.getenv(b'HOME')):                              
    print('SUCCESS:  was able to change mutable string')                                   
else:                                                                            
    print('FAIL!!!:  could not change supposedly mutable string')
(venv) kpoman@kpoman-T460s ./PycharmProjects/expybio/expybio/tests$ python -v --version
Python 3.7.6
(venv) kpoman@kpoman-T460s:./PycharmProjects/expybio/expybio/tests$ python ctypesgen_String.py 
HOME was originally:  /home/kpoman
Modified HOME variable to:  /h*me/kpoman
Fresh from environment:  /h*me/kpoman
SUCCESS:  was able to change mutable string
(venv) kpoman@kpoman-T460s:./PycharmProjects/expybio/expybio/tests$ 

The GRASS GIS project is using ctypesgen for a long time (thanks!) but now facing problems as it fails with Python 3.7.6+.

While we could work-around it would be nice to see this fixed here. Please see the related bug report:

https://trac.osgeo.org/grass/ticket/4018

Strange. Just tried this with Py 3.7.5 and it seems to have had no problems, including the return value being mutable as expected.

On the other hand: Any thoughts on replacing this interface with something less wonky?

I propose the following changes:

  1. replace return values by the object above "Ustr" such that the return value is a much more simple ctypes object referring to the pointer (and the memory referenced by it) returned by a function
  2. Whenever a "const char *" occurs, don't do anything special--just define the interface as c_char_p and let ctypes handle the autoconversion. This implies that only bytes will be allowed (under Py3), and str must be encoded to be passed in as a parameter value. Likewise, all "const char *" return values will come back as bytes in Py3 and will have to be decoded to get to a str object.
  3. Whenever a mutable "char *" is used as an argument, replace the argument type with something more native to ctypes and much less involved than the current string class: ctypes.POINTER(ctypes.c_char). This helps the user know they have to pass in something that resolves to a "char *"--there are several things that ctypes will automatically understand as a mutable "char *"

These changes should make the generated code:

  1. More robust and forward compatible
    • Makes the use easier because it is closer to ctypes normal use
    • Avoids keeping around our own string implementation that has to be maintained
  2. Use the automatic type conversion already done in ctypes rather than introducing our own complicated string representation
  3. Does imply some backwards incompatibilities:
    • Users will now have to explicitly send in bytes in Py3 (Py2 is unaffected here),
    • Users will have to explicitly type cast the results of any functions that return mutable strings (to str or bytes if that is what the user needs--unless the user wants access to the mutable char array).
    • All char buffers passed in as mutable char * parameters will need to be specifically allocated by the user (via routines such as ctypes.create_string_buffer() or perhaps directly by (ctypes.c_char*LEN)()). (Where memory is concerned here for mutable parameters, I much prefer this approach.)

Update:
In case this isn't already apparent, I was hoping for a comment from at least some ctypesgen devs/users: @Alan-R , @neteler , @betatim , @dazzag24 , ...

I think this was caused by a regression in Python. It took a long time to get fixed everywhere.

Revisiting the situation now that I'm a bit less new to ctypes, I think I strongly approve of @olsonse's suggestions.
ctypesgen's current string handling is way too involved indeed, and the code in question appears quite displeasing (re-implementation of Python string methods, incompatibility with Python 3.7.6 and 3.8.1).

IMO, explicit string encoding/decoding would be much better anyway, because implicit conversion is not guaranteed to work in the general case and can well produce inappropriate/confusing results. It always depends on the encoding the C function expects, whether NUL-termination is needed, etc.
(Another problem is that implicit conversion may encourage repeated encoding of the same string if the caller isn't aware of the operation.)

Concerning backwards (in)compatibility, I see two options:

  • Either replace the current string classes with something lean and make string auto-conversion opt-out, deprecating auto-conversion and switching to opt-in eventually
  • Or simply make the backwards-incompatible change with according notes on the next release (possibly incrementing major version).