unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

Home Page:https://www.union.ai/pandera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dask and Pandera Installation Problems via conda-forge

cmarshak opened this issue · comments

Describe the bug
A clear and concise description of what the bug is.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

This might a feedstock issue, but don't think dask is required for installation and use. I am using python 3.9.18 via mamba/conda-forge.

In [1]: import pandera
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:22, in _dask_expr_enabled()
     21 try:
---> 22     import dask_expr  # noqa: F401
     23 except ImportError:

ModuleNotFoundError: No module named 'dask_expr'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandera

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/__init__.py:4
      1 """A flexible and expressive pandas validation library."""
      2 import platform
----> 4 import pandera.backends
      5 from pandera import errors, external_config, typing
      6 from pandera.accessors import pandas_accessor

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/backends/__init__.py:6
      4 import pandera.backends.base.builtin_checks
      5 import pandera.backends.base.builtin_hypotheses
----> 6 import pandera.backends.pandas

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/backends/pandas/__init__.py:5
      1 """Pandas backend implementation for schemas and checks."""
      3 import pandas as pd
----> 5 import pandera.typing
      6 from pandera.api.checks import Check
      7 from pandera.api.hypotheses import Hypothesis

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/typing/__init__.py:9
      1 """Typing module.
      2
      3 For backwards compatibility, pandas types are exposed to the top-level scope of
      4 the typing module.
      5 """
      7 from typing import Set, Type
----> 9 from pandera.typing import (
     10     dask,
     11     fastapi,
     12     geopandas,
     13     modin,
     14     pyspark,
     15     pyspark_sql,
     16 )
     17 from pandera.typing.common import (
     18     BOOL,
     19     INT8,
   (...)
     49     UInt64,
     50 )
     51 from pandera.typing.pandas import DataFrame, Index, Series

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/typing/dask.py:9
      6 from pandera.typing.pandas import DataFrameModel, GenericDtype
      8 try:
----> 9     import dask.dataframe as dd
     11     DASK_INSTALLED = True
     12 except ImportError:

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:87
     84 except ImportError:
     85     pass
---> 87 if _dask_expr_enabled():
     88     import dask_expr as dd
     90     # trigger loading of dask-expr which will in-turn import dask.dataframe and run remainder
     91     # of this module's init updating attributes to be dask-expr
     92     # note: needs reload, incase dask-expr imported before dask.dataframe; works fine otherwise

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:24, in _dask_expr_enabled()
     22     import dask_expr  # noqa: F401
     23 except ImportError:
---> 24     raise ValueError("Must install dask-expr to activate query planning.")
     25 return True

ValueError: Must install dask-expr to activate query planning.

Code Sample, a copy-pastable example

import pandera

Expected behavior

Package imports.

Desktop (please complete the following information):

  • OS: OSX
  • Browser: Chrome
  • Version: 14.3.1

Additional context

Might be a feedstock issue - but it appears the software is incorrectly assuming that dask is installed when it's not.

The issue goes away after I install dask.

Can you provide repro instructions? A fresh install of pandera (without the dask extra) doesn't repro this issue

I am using mambaforge. I saw this cropping up in our integration testing via github actions.

The reproduction instructions would be:

mamba env update -f environment.yml  # https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/blob/42038c0bce2b03fab531a9f5de10400d1766432c/environment.yml
conda activate topsapp_env
python -c "import pandera"

I reproduced the error on my M2 mac via rosetta.

It's a bigger environment and pandera is being installed via the tile-mate recipe.

is it possible to share the environment.yml file?

Also would you mind editing to title to be more descriptive? @cmarshak

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/pandera/typing/dask.py:9
      6 from pandera.typing.pandas import DataFrameModel, GenericDtype
      8 try:
----> 9     import dask.dataframe as dd
     11     DASK_INSTALLED = True
     12 except ImportError:

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:87
     84 except ImportError:
     85     pass
---> 87 if _dask_expr_enabled():
     88     import dask_expr as dd
     90     # trigger loading of dask-expr which will in-turn import dask.dataframe and run remainder
     91     # of this module's init updating attributes to be dask-expr
     92     # note: needs reload, incase dask-expr imported before dask.dataframe; works fine otherwise

doesn't this part of the stack trace imply that dask is installed?

File ~/miniforge3/envs/topsapp_env/lib/python3.9/site-packages/dask/dataframe/__init__.py:87

Which doesn't line up with this statement:

The issue goes away after I install dask.

@cosmicBboy - hope this is not taking too much of your time. I can investigate further next week. Thank you for your help thus far.

https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/blob/42038c0bce2b03fab531a9f5de10400d1766432c/environment.yml

I put in the comment of the code block here it is spelled out:

name: topsapp_env
channels:
 - conda-forge
dependencies:
 - python>=3.9,<3.10
 - pip
 - affine
 - asf_search>=5.0.0
 - boto3
 - dateparser
 - flake8
 - flake8-blind-except
 - flake8-builtins
 - flake8-import-order
 - fsspec
 - gdal
 - geopandas
 - hyp3lib>=3,<4
 - ipykernel
 - isce2==2.6.1
 - jinja2
 - joblib
 - jsonschema==3.2.0
 - jupyter
 - lxml
 - matplotlib
 - netcdf4
 - notebook
 - numpy<1.24
 - pandas
 - pysolid
 - papermill
 - pytest
 - pydantic
 - pytest-cov
 - pytest-mock
 - rasterio
 - rioxarray<0.14.0
 - xarray
 - scipy<1.10
 - setuptools
 - setuptools_scm
 - shapely
 - tqdm
 - dem_stitcher>=2.5.5
 - aiohttp  # only needed for manifest and swath download
 - tile_mate>=0.0.8

tile_mate is what requires pandera.

I am not as adept at navigating conda-forge as I would like. It's unclear why the program thinks dask is installed when it is clearly not. That's what I am seeing too.

You can try creating a conda-lock file from this to see which dependency is causingdask (and its seems dask_expr) is being installed.

friendly ping @cmarshak. with conda-lock we can figure out which of the dependencies require dask

I am using rosetta on my M1 mac for this 3.9. Conda-lock is not playing nice in this environment. For the time being - I am going to close this issue as installing dask fixed this and this environment is in 3.9 to support this monolith code-base(oy)!

Pandera is fantastic and thank you for your attention/help!