DataExploration.profile results in "ErrorMessage': "unsupported operand type(s) for -: 'int' and 'NoneType'"
paulochf opened this issue · comments
Hey all!
I'm trying to use the package but I'm getting that message.
import luminaire
import pandas as pd
from luminaire.exploration.data_exploration import DataExploration
past = pd.read_csv("dataset.csv").set_index("index")
de = DataExploration(freq='D')
past_prof, profile = de.profile(df=past)
#(None,
#{'success': False,
# 'ErrorMessage': "unsupported operand type(s) for -: 'int' and 'NoneType'"})
Is that anything data-related?
Here are my infos:
- Python 3.7.10
- requirements.txt: see below, result from
pip install -U jupyterlab numpy pandas matplotlib luminaire pip setuptools pyarrow
Thanks!
anyio==3.6.1
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
attrs==22.1.0
Babel==2.10.3
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
boto3==1.24.76
botocore==1.27.76
certifi==2022.9.14
cffi==1.15.1
changepy==0.3.1
charset-normalizer==2.1.1
cloudpickle==2.2.0
cycler==0.11.0
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
fastjsonschema==2.16.2
fonttools==4.37.2
future==0.18.2
hyperopt==0.2.7
idna==3.4
importlib-metadata==4.12.0
importlib-resources==5.9.0
ipykernel==6.15.3
ipython==7.34.0
ipython-genutils==0.2.0
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.10
jsonschema==4.16.0
jupyter-core==4.11.1
jupyter-server==1.18.1
jupyter_client==7.3.5
jupyterlab==3.4.7
jupyterlab-pygments==0.2.2
jupyterlab_server==2.15.1
kiwisolver==1.4.4
luminaire==0.4.0
lxml==4.9.1
MarkupSafe==2.1.1
matplotlib==3.5.3
matplotlib-inline==0.1.6
mistune==2.0.4
nbclassic==0.4.3
nbclient==0.6.8
nbconvert==7.0.0
nbformat==5.5.0
nest-asyncio==1.5.5
networkx==2.6.3
notebook==6.4.12
notebook-shim==0.1.0
numpy==1.21.6
packaging==21.3
pandas==1.3.5
pandas-redshift==2.0.5
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
pkgutil_resolve_name==1.3.10
prometheus-client==0.14.1
prompt-toolkit==3.0.31
psutil==5.9.2
psycopg2-binary==2.9.3
ptyprocess==0.7.0
py4j==0.10.9.7
pyarrow==9.0.0
pycparser==2.21
Pygments==2.13.0
pykalman==0.9.5
pyparsing==3.0.9
pyrsistent==0.18.1
python-dateutil==2.8.2
pytz==2022.2.1
pyzmq==24.0.0
requests==2.28.1
s3transfer==0.6.0
scikit-learn==1.0.2
scipy==1.7.3
Send2Trash==1.8.0
six==1.16.0
sniffio==1.3.0
soupsieve==2.3.2.post1
statsmodels==0.13.2
terminado==0.15.0
threadpoolctl==3.1.0
tinycss2==1.1.1
tomli==2.0.1
tornado==6.2
tqdm==4.64.1
traitlets==5.4.0
typing_extensions==4.3.0
urllib3==1.26.12
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.1
zipp==3.8.1
It is difficult to understand the issue just from the screenshot. Can you explicitly cast the raw
column to np.float and see if you can reproduce the error? Otherwise, can you reproduce it any other sharable data? I doubt it could possibly be data related.
I added the code as text and image. The image shows that the data is already as float.
I could debug the code and realized that fill_rate
has None
as the default value when it should be a float.
Is that expected? If the argument is required, it should be indicated as one and checked beforehand.
@paulochf I think your debugging caught the right issue. This example shows the required attributes for the DataExploration
class. You can also bypass it and run the hyperparameter optimization the gives you a best fill rate based on the missingness pattern in the data. Please refer to this example for more details.
The purpose of this fill_rate
parameter is to understand the missingness pattern over the longer history and not to generate overconfident predictions (with narrower confidence bounds) based on a recent time window and avoid relying on too many synthetic data.
That's a nice piece of information! I wish I had seen it before, as I went directly on the API doc page. It could be indicated there as a required parameter.
Thank you for your replies!