Dask and Pandera Installation Problems via conda-forge
cmarshak opened this issue · comments
Describe the bug
A clear and concise description of what the bug is.
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera.
- (optional) I have confirmed this bug exists on the master branch of pandera.
This might a feedstock issue, but don't think dask is required for installation and use. I am using python 3.9.18 via mamba/conda-forge.
In [1]: import pandera
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:22, in _dask_expr_enabled()
21 try:
---> 22 import dask_expr # noqa: F401
23 except ImportError:
ModuleNotFoundError: No module named 'dask_expr'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandera
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/__init__.py:4
1 """A flexible and expressive pandas validation library."""
2 import platform
----> 4 import pandera.backends
5 from pandera import errors, external_config, typing
6 from pandera.accessors import pandas_accessor
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/backends/__init__.py:6
4 import pandera.backends.base.builtin_checks
5 import pandera.backends.base.builtin_hypotheses
----> 6 import pandera.backends.pandas
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/backends/pandas/__init__.py:5
1 """Pandas backend implementation for schemas and checks."""
3 import pandas as pd
----> 5 import pandera.typing
6 from pandera.api.checks import Check
7 from pandera.api.hypotheses import Hypothesis
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/typing/__init__.py:9
1 """Typing module.
2
3 For backwards compatibility, pandas types are exposed to the top-level scope of
4 the typing module.
5 """
7 from typing import Set, Type
----> 9 from pandera.typing import (
10 dask,
11 fastapi,
12 geopandas,
13 modin,
14 pyspark,
15 pyspark_sql,
16 )
17 from pandera.typing.common import (
18 BOOL,
19 INT8,
(...)
49 UInt64,
50 )
51 from pandera.typing.pandas import DataFrame, Index, Series
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/typing/dask.py:9
6 from pandera.typing.pandas import DataFrameModel, GenericDtype
8 try:
----> 9 import dask.dataframe as dd
11 DASK_INSTALLED = True
12 except ImportError:
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:87
84 except ImportError:
85 pass
---> 87 if _dask_expr_enabled():
88 import dask_expr as dd
90 # trigger loading of dask-expr which will in-turn import dask.dataframe and run remainder
91 # of this module's init updating attributes to be dask-expr
92 # note: needs reload, incase dask-expr imported before dask.dataframe; works fine otherwise
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:24, in _dask_expr_enabled()
22 import dask_expr # noqa: F401
23 except ImportError:
---> 24 raise ValueError("Must install dask-expr to activate query planning.")
25 return True
ValueError: Must install dask-expr to activate query planning.
Code Sample, a copy-pastable example
import pandera
Expected behavior
Package imports.
Desktop (please complete the following information):
- OS: OSX
- Browser: Chrome
- Version: 14.3.1
Additional context
Might be a feedstock issue - but it appears the software is incorrectly assuming that dask is installed when it's not.
The issue goes away after I install dask
.
Can you provide repro instructions? A fresh install of pandera (without the dask extra) doesn't repro this issue
I am using mambaforge. I saw this cropping up in our integration testing via github actions.
The reproduction instructions would be:
mamba env update -f environment.yml # https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/blob/42038c0bce2b03fab531a9f5de10400d1766432c/environment.yml
conda activate topsapp_env
python -c "import pandera"
I reproduced the error on my M2 mac via rosetta.
It's a bigger environment and pandera is being installed via the tile-mate recipe.
is it possible to share the environment.yml
file?
Also would you mind editing to title to be more descriptive? @cmarshak
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/typing/dask.py:9
6 from pandera.typing.pandas import DataFrameModel, GenericDtype
8 try:
----> 9 import dask.dataframe as dd
11 DASK_INSTALLED = True
12 except ImportError:
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:87
84 except ImportError:
85 pass
---> 87 if _dask_expr_enabled():
88 import dask_expr as dd
90 # trigger loading of dask-expr which will in-turn import dask.dataframe and run remainder
91 # of this module's init updating attributes to be dask-expr
92 # note: needs reload, incase dask-expr imported before dask.dataframe; works fine otherwise
doesn't this part of the stack trace imply that dask is installed?
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:87
Which doesn't line up with this statement:
The issue goes away after I install dask.
@cosmicBboy - hope this is not taking too much of your time. I can investigate further next week. Thank you for your help thus far.
I put in the comment of the code block here it is spelled out:
name: topsapp_env
channels:
- conda-forge
dependencies:
- python>=3.9,<3.10
- pip
- affine
- asf_search>=5.0.0
- boto3
- dateparser
- flake8
- flake8-blind-except
- flake8-builtins
- flake8-import-order
- fsspec
- gdal
- geopandas
- hyp3lib>=3,<4
- ipykernel
- isce2==2.6.1
- jinja2
- joblib
- jsonschema==3.2.0
- jupyter
- lxml
- matplotlib
- netcdf4
- notebook
- numpy<1.24
- pandas
- pysolid
- papermill
- pytest
- pydantic
- pytest-cov
- pytest-mock
- rasterio
- rioxarray<0.14.0
- xarray
- scipy<1.10
- setuptools
- setuptools_scm
- shapely
- tqdm
- dem_stitcher>=2.5.5
- aiohttp # only needed for manifest and swath download
- tile_mate>=0.0.8
tile_mate
is what requires pandera
.
I am not as adept at navigating conda-forge as I would like. It's unclear why the program thinks dask is installed when it is clearly not. That's what I am seeing too.
You can try creating a conda-lock file from this to see which dependency is causingdask
(and its seems dask_expr
) is being installed.
friendly ping @cmarshak. with conda-lock we can figure out which of the dependencies require dask
I am using rosetta on my M1 mac for this 3.9. Conda-lock is not playing nice in this environment. For the time being - I am going to close this issue as installing dask fixed this and this environment is in 3.9 to support this monolith code-base(oy)!
Pandera is fantastic and thank you for your attention/help!