dask / dask-docker

Docker images for dask

Home Page:https://hub.docker.com/u/daskdev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations' when running example notebooks

lforesta opened this issue · comments

What happened:
I am not fully sure this is a bug, or it is due to an incorrect setup/installation.
However, I am using the provided docker-compose to test a local dockerized instance of dask, but I can't execute any job on it.

Currently, I simply tried a few of the provided example notebooks(e.g. number 4), and they did not run correctly. The following error is returned: AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations'
He is the stack trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-a7bc8667f5ea> in <module>
----> 1 x = x.persist()
      2 progress(x)

/opt/conda/lib/python3.8/site-packages/dask/base.py in persist(self, **kwargs)
    253         dask.base.persist
    254         """
--> 255         (result,) = persist(self, traverse=False, **kwargs)
    256         return result
    257 

/opt/conda/lib/python3.8/site-packages/dask/base.py in persist(*args, **kwargs)
    754             else:
    755                 if client.get == schedule:
--> 756                     results = client.persist(
    757                         collections, optimize_graph=optimize_graph, **kwargs
    758                     )

/opt/conda/lib/python3.8/site-packages/distributed/client.py in persist(self, collections, optimize_graph, workers, allow_other_workers, resources, retries, priority, fifo_timeout, actors, **kwargs)
   2942         names = {k for c in collections for k in flatten(c.__dask_keys__())}
   2943 
-> 2944         futures = self._graph_to_futures(
   2945             dsk,
   2946             names,

/opt/conda/lib/python3.8/site-packages/distributed/client.py in _graph_to_futures(self, dsk, keys, workers, allow_other_workers, priority, user_priority, resources, retries, fifo_timeout, actors)
   2541                 dsk = HighLevelGraph.from_collections(id(dsk), dsk, dependencies=())
   2542 
-> 2543             dsk = highlevelgraph_pack(dsk, self, keyset)
   2544 
   2545             annotations = {}

/opt/conda/lib/python3.8/site-packages/distributed/protocol/highlevelgraph.py in highlevelgraph_pack(hlg, client, client_keys)
    113                 "__module__": None,
    114                 "__name__": None,
--> 115                 "state": _materialized_layer_pack(
    116                     layer,
    117                     hlg.get_all_external_keys(),

/opt/conda/lib/python3.8/site-packages/distributed/protocol/highlevelgraph.py in _materialized_layer_pack(layer, all_keys, known_key_dependencies, client, client_keys)
     63     }
     64 
---> 65     annotations = layer.pack_annotations()
     66     all_keys = all_keys.union(dsk)
     67     dsk = {stringify(k): stringify(v, exclusive=all_keys) for k, v in dsk.items()}

AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations'

What you expected to happen:
Computation should start on the dask cluster

Minimal Complete Verifiable Example:
Run docker-compose up, connect to Jupyter Notebook and exec e.g. notebook 04, or paste this:

from dask.distributed import Client, progress
c = Client()

import dask.array as da
x = da.random.random(size=(10000, 10000), chunks=(1000, 1000))

x = x.persist()
progress(x)

Anything else we need to know?:

Environment:
Printing the distributed client object returns the following:

/opt/conda/lib/python3.8/site-packages/distributed/client.py:1135: VersionMismatchWarning: Mismatched versions found

+---------+---------------+---------------+---------------+
| Package | client        | scheduler     | workers       |
+---------+---------------+---------------+---------------+
| blosc   | 1.10.2        | 1.9.2         | 1.9.2         |
| lz4     | 3.1.3         | 3.1.1         | 3.1.1         |
| msgpack | 1.0.2         | 1.0.0         | 1.0.0         |
| python  | 3.8.6.final.0 | 3.8.0.final.0 | 3.8.0.final.0 |
+---------+---------------+---------------+---------------+
Notes: 
-  msgpack: Variation is ok, as long as everything is above 0.6
  warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
  • Dask version: 2021.2.0 (from conda-forge)
  • Python version: 3.8
  • Operating System: Ubuntu 18.04 (but I run dask in docker)
  • Install method (conda, pip, source): docker

Thanks for raising an issue @lforesta. MaterializedLayer was recently added to Dask but hasn't been released yet, so I suspect you're probably using an unreleased, dev version of Dask. Could you inspect the output of client.get_versions() to see what version of Dask and Distributed being used on the cluster?

@jrbourbeau thanks for the answer
This is the output of client.get_versions():

{'scheduler': {'host': {'python': '3.8.0.final.0',
   'python-bits': 64,
   'OS': 'Linux',
   'OS-release': '4.15.0-135-generic',
   'machine': 'x86_64',
   'processor': '',
   'byteorder': 'little',
   'LC_ALL': 'C.UTF-8',
   'LANG': 'C.UTF-8'},
  'packages': {'python': '3.8.0.final.0',
   'dask': '2021.02.0+37.g61b578f5',
   'distributed': '2021.02.0',
   'msgpack': '1.0.0',
   'cloudpickle': '1.6.0',
   'tornado': '6.1',
   'toolz': '0.11.1',
   'numpy': '1.18.1',
   'lz4': '3.1.1',
   'blosc': '1.9.2'}},
 'workers': {'tcp://172.18.0.3:33619': {'host': {'python': '3.8.0.final.0',
    'python-bits': 64,
    'OS': 'Linux',
    'OS-release': '4.15.0-135-generic',
    'machine': 'x86_64',
    'processor': '',
    'byteorder': 'little',
    'LC_ALL': 'C.UTF-8',
    'LANG': 'C.UTF-8'},
   'packages': {'python': '3.8.0.final.0',
    'dask': '2021.02.0+37.g61b578f5',
    'distributed': '2021.02.0',
    'msgpack': '1.0.0',
    'cloudpickle': '1.6.0',
    'tornado': '6.1',
    'toolz': '0.11.1',
    'numpy': '1.18.1',
    'lz4': '3.1.1',
    'blosc': '1.9.2'}}},
 'client': {'host': {'python': '3.8.6.final.0',
   'python-bits': 64,
   'OS': 'Linux',
   'OS-release': '4.15.0-135-generic',
   'machine': 'x86_64',
   'processor': 'x86_64',
   'byteorder': 'little',
   'LC_ALL': 'en_US.UTF-8',
   'LANG': 'en.UTF-8'},
  'packages': {'python': '3.8.6.final.0',
   'dask': '2021.02.0+37.g61b578f5',
   'distributed': '2021.02.0',
   'msgpack': '1.0.2',
   'cloudpickle': '1.6.0',
   'tornado': '6.1',
   'toolz': '0.11.1',
   'numpy': '1.18.1',
   'lz4': '3.1.3',
   'blosc': '1.10.2'}}}

The version of dask/distributed is fixed in the Dockerfile itself, I have not modified that

Thanks @lforesta, that helps. I've opened #147 to fix this issue.

Meanwhile, should downgrading the version of dask to the dask==2021.1.1 conda release fix the issue for my local setup?

I would downgrade to dask==2021.02.0 to match the distributed version already in the container

Thanks! I'll try that then

@jrbourbeau I have tried updating the Dockerfile with both dask==2021.02.0 and dask==2021.1.1, but I get the same output error.
In case it is useful, this is the output of client.get_versions() for the case with dask==2021.1.1

{'scheduler': {'host': {'python': '3.8.0.final.0',
   'python-bits': 64,
   'OS': 'Linux',
   'OS-release': '4.15.0-135-generic',
   'machine': 'x86_64',
   'processor': '',
   'byteorder': 'little',
   'LC_ALL': 'C.UTF-8',
   'LANG': 'C.UTF-8'},
  'packages': {'python': '3.8.0.final.0',
   'dask': '2021.02.0+38.g8663c6b7',
   'distributed': '2021.01.1',
   'msgpack': '1.0.0',
   'cloudpickle': '1.6.0',
   'tornado': '6.1',
   'toolz': '0.11.1',
   'numpy': '1.18.1',
   'lz4': '3.1.1',
   'blosc': '1.9.2'}},
 'workers': {'tcp://172.18.0.4:38197': {'host': {'python': '3.8.0.final.0',
    'python-bits': 64,
    'OS': 'Linux',
    'OS-release': '4.15.0-135-generic',
    'machine': 'x86_64',
    'processor': '',
    'byteorder': 'little',
    'LC_ALL': 'C.UTF-8',
    'LANG': 'C.UTF-8'},
   'packages': {'python': '3.8.0.final.0',
    'dask': '2021.02.0+38.g8663c6b7',
    'distributed': '2021.01.1',
    'msgpack': '1.0.0',
    'cloudpickle': '1.6.0',
    'tornado': '6.1',
    'toolz': '0.11.1',
    'numpy': '1.18.1',
    'lz4': '3.1.1',
    'blosc': '1.9.2'}}},
 'client': {'host': {'python': '3.8.6.final.0',
   'python-bits': 64,
   'OS': 'Linux',
   'OS-release': '4.15.0-135-generic',
   'machine': 'x86_64',
   'processor': 'x86_64',
   'byteorder': 'little',
   'LC_ALL': 'en_US.UTF-8',
   'LANG': 'en.UTF-8'},
  'packages': {'python': '3.8.6.final.0',
   'dask': '2021.02.0+38.g8663c6b7',
   'distributed': '2021.02.0',
   'msgpack': '1.0.2',
   'cloudpickle': '1.6.0',
   'tornado': '6.1',
   'toolz': '0.11.1',
   'numpy': '1.18.1',
   'lz4': '3.1.3',
   'blosc': '1.10.2'}}}

This is because EXTRA_PIP_PACKAGES is still pointing to the development version of Dask

if [ "$EXTRA_PIP_PACKAGES" ]; then
echo "EXTRA_PIP_PACKAGES environment variable found. Installing".
/opt/conda/bin/pip install $EXTRA_PIP_PACKAGES
fi

Could you try out the changes in #147?

Indeed I should have tried that first, thanks it worked :)

Good to hear, thanks again for reporting this issue!