zillow / luminaire

Luminaire is a python package that provides ML driven solutions for monitoring time series data.

Home Page:https://zillow.github.io/luminaire

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataExploration.profile results in "ErrorMessage': "unsupported operand type(s) for -: 'int' and 'NoneType'"

paulochf opened this issue · comments

Hey all!

I'm trying to use the package but I'm getting that message.

import luminaire
import pandas as pd

from luminaire.exploration.data_exploration import DataExploration

past = pd.read_csv("dataset.csv").set_index("index")

de = DataExploration(freq='D')

past_prof, profile = de.profile(df=past)
#(None,
#{'success': False,
# 'ErrorMessage': "unsupported operand type(s) for -: 'int' and 'NoneType'"})

image

Is that anything data-related?

Here are my infos:

  • Python 3.7.10
  • requirements.txt: see below, result from pip install -U jupyterlab numpy pandas matplotlib luminaire pip setuptools pyarrow

Thanks!


anyio==3.6.1
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
attrs==22.1.0
Babel==2.10.3
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
boto3==1.24.76
botocore==1.27.76
certifi==2022.9.14
cffi==1.15.1
changepy==0.3.1
charset-normalizer==2.1.1
cloudpickle==2.2.0
cycler==0.11.0
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
fastjsonschema==2.16.2
fonttools==4.37.2
future==0.18.2
hyperopt==0.2.7
idna==3.4
importlib-metadata==4.12.0
importlib-resources==5.9.0
ipykernel==6.15.3
ipython==7.34.0
ipython-genutils==0.2.0
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.10
jsonschema==4.16.0
jupyter-core==4.11.1
jupyter-server==1.18.1
jupyter_client==7.3.5
jupyterlab==3.4.7
jupyterlab-pygments==0.2.2
jupyterlab_server==2.15.1
kiwisolver==1.4.4
luminaire==0.4.0
lxml==4.9.1
MarkupSafe==2.1.1
matplotlib==3.5.3
matplotlib-inline==0.1.6
mistune==2.0.4
nbclassic==0.4.3
nbclient==0.6.8
nbconvert==7.0.0
nbformat==5.5.0
nest-asyncio==1.5.5
networkx==2.6.3
notebook==6.4.12
notebook-shim==0.1.0
numpy==1.21.6
packaging==21.3
pandas==1.3.5
pandas-redshift==2.0.5
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
pkgutil_resolve_name==1.3.10
prometheus-client==0.14.1
prompt-toolkit==3.0.31
psutil==5.9.2
psycopg2-binary==2.9.3
ptyprocess==0.7.0
py4j==0.10.9.7
pyarrow==9.0.0
pycparser==2.21
Pygments==2.13.0
pykalman==0.9.5
pyparsing==3.0.9
pyrsistent==0.18.1
python-dateutil==2.8.2
pytz==2022.2.1
pyzmq==24.0.0
requests==2.28.1
s3transfer==0.6.0
scikit-learn==1.0.2
scipy==1.7.3
Send2Trash==1.8.0
six==1.16.0
sniffio==1.3.0
soupsieve==2.3.2.post1
statsmodels==0.13.2
terminado==0.15.0
threadpoolctl==3.1.0
tinycss2==1.1.1
tomli==2.0.1
tornado==6.2
tqdm==4.64.1
traitlets==5.4.0
typing_extensions==4.3.0
urllib3==1.26.12
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.1
zipp==3.8.1

It is difficult to understand the issue just from the screenshot. Can you explicitly cast the raw column to np.float and see if you can reproduce the error? Otherwise, can you reproduce it any other sharable data? I doubt it could possibly be data related.

I added the code as text and image. The image shows that the data is already as float.

I could debug the code and realized that fill_rate has None as the default value when it should be a float.

Is that expected? If the argument is required, it should be indicated as one and checked beforehand.

@paulochf I think your debugging caught the right issue. This example shows the required attributes for the DataExploration class. You can also bypass it and run the hyperparameter optimization the gives you a best fill rate based on the missingness pattern in the data. Please refer to this example for more details.

The purpose of this fill_rate parameter is to understand the missingness pattern over the longer history and not to generate overconfident predictions (with narrower confidence bounds) based on a recent time window and avoid relying on too many synthetic data.

That's a nice piece of information! I wish I had seen it before, as I went directly on the API doc page. It could be indicated there as a required parameter.

Thank you for your replies!