uqfoundation / dill

serialize all of Python

Home Page:http://dill.rtfd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`save_function()` can't save function in a submodule that has the same name as an attribute of the parent module

kelvinburke opened this issue · comments

Hi,
So if there is a module test_module/__init__.py:

from .test import *

and then in test_module/test.py there is:

def test():
    pass
def test_function():
    pass

Then if you import test_module then the name test_module.test goes to the function test_module.test() instead of the submodule test_module.test.
If we then try to (dill) pickle either one of the functions it will raise an error:

import dill
import tempfile
import test_module
file = tempfile.TemporaryFile()
dill._dill.StockPickler(file).save(test_module.test_function)
# OR
dill._dill.StockPickler(file).save(test_module.test)
# OR
import pickle
pickle._Pickler(file).save(test_module.test_function)

Any of the above 3 will throw an error:

Traceback (most recent call last):
  File "C:\code\test_bug3.py", line 5, in <module>
    dill._dill.StockPickler(file).save(test_module.test_function)
  File "C:\...\AppData\Local\Programs\Python\Python311\Lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
    ^^^^^^^^^^^^
  File "C:\...\.venv\Lib\site-packages\dill\_dill.py", line 1940, in save_function
    for stack_element in _postproc:
TypeError: 'NoneType' object is not iterable

This is using dill==0.3.7 and Python 3.11 and on the master branch too f66ed3b, and a repo to reproduce it is: https://github.com/kelvinburke/dill-issue

I think this is because the function _import_module() returns the function test_module.test instead of the submodule of the same name.
I think this can be easily fixed with a check that the getattr(__import__(module, None, None, [obj]), obj) returns the right type
See commit: kelvinburke@228a700

Note I think this is the same problem causing #604 but this a slightly different error and simpler to reproduce.

I will open a PR soon that I think will fix both.

I can reproduce the error. However, you'll note that if you are using the StockPickler, you are essentially using the Pickler from pickle and not dill. If you use the intended dill.Pickler, it seems to work as expected (i.e. anything in dill._dill is not intended to be used directly).

Python 3.11.6 (main, Oct  2 2023, 18:01:19) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import tempfile
>>> import test_module
>>> file = tempfile.TemporaryFile()
>>> dill.copy(file)
<_io.BufferedRandom name=3>
>>> dill.copy(test_module.test)
<function test at 0x101ee28e0>
>>> dill.copy(test_module.test_function)
<function test_function at 0x101d0c5e0>
>>> 
>>> dill.Pickler(file).save(test_module.test)
>>> dill.Pickler(file).save(test_module.test_function)
>>> dill._dill.StockPickler(file).save(test_module.test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
    ^^^^^^^^^^^^
  File "/Users/mmckerns/lib/python3.11/site-packages/dill/_dill.py", line 1940, in save_function
    for stack_element in _postproc:
TypeError: 'NoneType' object is not iterable
>>> 

Yes agree, but some downstream packages use pickle._Pickler which dill overrides (with _extend()) if it has been imported, causing this particular problem.

In my case I am doing (with the same setup as before)

import joblib
joblib.hash(test_module.test_function)

which has the same error as above