Bugging creation of report
machtom1 opened this issue · comments
Current Behaviour
Hi,
right now when I create a report, some columns are labeled as numeric values and not as expected labeled as categorical values. When I create the report without casting the columns, the creation of the report takes only less than one minute. Now after casting the columns as string values, the report takes forever and I had to stop executing it after waiting for more then 30 mins. It begins to stuck at columns B and F (have a look at the columns description in 'Data Description'). I tried the casting of the datatype in the read_csv and with .astype(string), which is also shown in the code-section.
Expected Behaviour
I expect the report to show the correct numerical and categorical values
Data Description
The data is a mix of date values and categorical values. Here is a chart of the missing values and the datatypes, before and after after casting them:
Missing values dtype
A 0 Date
B 0 Date
C 3 object
D 0 object
E 0 object
F 0 object
G 86317 object
H 39 date
I 6871 object
J 0 object
Code that reproduces the bug
#Way one:
df_data = pd.read_csv('example.csv', parse_dates=['A', 'B'], dtype={
'C' : 'string',
'D' : 'string',
'E' : 'string'
}
)
#Way two:
df_data = pd.read_csv('example.csv')
df_data['C'] = df_data['A'].astype(str)
df_data['D'] = df_data['A'].astype(str)
df_data['E'] = df_data['A'].astype(str)
#And then:
profile = ProfileReport(df_data)
pandas-profiling version
v. 4.6.4
Dependencies
absl-py==2.0.0
adagio==0.2.4
aiohttp==3.9.2
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.11.1
appdirs==1.4.4
asttokens==2.4.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
cachetools==5.3.2
certifi==2023.11.17
charset-normalizer==3.3.2
cloudpickle==3.0.0
colorama==0.4.6
comm==0.2.1
contourpy==1.2.0
cycler==0.12.1
Cython==3.0.8
dacite==1.8.1
darts==0.27.2
debugpy==1.8.0
decorator==5.1.1
et-xmlfile==1.1.0
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.13.1
flatbuffers==23.5.26
fonttools==4.47.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.12.2
fugue==0.8.7
fugue-sql-antlr==0.2.0
gast==0.5.4
google-auth==2.26.2
google-auth-oauthlib==1.2.0
google-pasta==0.2.0
grpcio==1.60.0
h5py==3.10.0
holidays==0.41
htmlmin==0.1.12
idna==3.6
ImageHash==4.3.1
importlib-metadata==7.0.1
importlib-resources==6.1.1
ipykernel==6.28.0
ipython==8.18.1
ipywidgets==8.1.1
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyterlab-widgets==3.0.9
keras==2.15.0
kiwisolver==1.4.5
libclang==16.0.6
lightgbm==4.3.0
lightning-utilities==0.10.1
llvmlite==0.41.1
Markdown==3.5.2
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
ml-dtypes==0.2.0
MouseInfo==0.1.3
mpmath==1.3.0
multidict==6.0.4
multimethod==1.11
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
nfoursid==1.0.1
numba==0.58.1
numpy==1.25.2
oauthlib==3.2.2
openpyxl==3.1.2
opt-einsum==3.3.0
packaging==23.2
pandas==2.1.4
parso==0.8.3
patsy==0.5.6
phik==0.12.4
pillow==10.2.0
platformdirs==4.1.0
plotly==5.18.0
pmdarima==2.0.4
prompt-toolkit==3.0.43
protobuf==4.23.4
psutil==5.9.7
pure-eval==0.2.2
pyarrow==15.0.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
PyAutoGUI==0.9.54
pydantic==2.6.1
pydantic_core==2.16.2
PyGetWindow==0.0.9
Pygments==2.17.2
PyMsgBox==1.0.9
pyod==1.1.2
pyparsing==3.1.1
pyperclip==1.8.2
PyRect==0.2.0
PyScreeze==0.1.30
python-dateutil==2.8.2
pytorch-lightning==2.1.2
pytweening==1.0.7
pytz==2023.3.post1
PyWavelets==1.5.0
pywin32==306
PyYAML==6.0.1
pyzmq==25.1.2
qpd==0.4.4
referencing==0.32.1
requests==2.31.0
requests-oauthlib==1.3.1
rpds-py==0.17.1
rsa==4.9
scikit-learn==1.3.2
scipy==1.11.4
seaborn==0.12.2
shap==0.44.1
six==1.16.0
slicer==0.0.7
sqlglot==20.10.0
stack-data==0.6.3
statsforecast==1.7.1
statsmodels==0.14.1
sweetviz==2.3.1
sympy==1.12
tangled-up-in-unicode==0.2.0
tbats==1.1.3
tenacity==8.2.3
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
tensorflow==2.15.0
tensorflow-estimator==2.15.0
tensorflow-intel==2.15.0
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.4.0
threadpoolctl==3.2.0
torch==2.1.2
torchmetrics==1.3.0.post0
tornado==6.4
tqdm==4.66.1
traitlets==5.14.1
triad==0.9.5
typeguard==4.1.5
typing_extensions==4.9.0
tzdata==2023.4
urllib3==2.1.0
utilsforecast==0.0.26
visions==0.7.5
wcwidth==0.2.13
Werkzeug==3.0.1
widgetsnbextension==4.0.9
wordcloud==1.9.3
wrapt==1.14.1
xarray==2024.1.1
xgboost==2.0.3
yarl==1.9.4
ydata-profiling==4.6.4
zipp==3.17.0
OS
No response
Checklist
- There is not yet another bug report for this issue in the issue tracker
- The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
- The issue has not been resolved by the entries listed under Common Issues.