ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Home Page:https://docs.profiling.ydata.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug Report

ronfisher21 opened this issue · comments

Current Behaviour

Hi my code is pretty simple, i read 2 parquet file, created 2 reports for each pandas dataframes and used the compare method to generate a compare method. I tried to use the 'to_json()' method to convert my report to json and i got the following error:
"TypeError: to_dict() got an unexpected keyword argument 'orient'"

I saw that you already resolved this issue in:
fix: comparison to_json pd.Series encoding error #1538

I ungraded the package to the latest version and i still get the same error.

Expected Behaviour

I expected to convert my report to a json successfuly.

Data Description

The datasets i am using are confidential but the data format is parquet.

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport

df_ref = pd.read_parquet('dir/to/my/data/df_ref.parquet')
df_old = pd.read_parquet('dir/to/my/data/df_old.parquet')

ref_report = ProfileReport(df_ref, title='df ref report')
old_report = ProfileReport(df_old, title='df old report')

comparison_report = ref_report.compare(old_report)
comparison_report.to_json()

pandas-profiling version

v4.6.4

Dependencies

adagio==0.2.4

aiofiles==23.2.1

aiosignal==1.3.1

alabaster==0.7.13

alembic==1.12.0

altair==5.1.1

annotated-types==0.6.0

ansi2html==1.8.0

antlr4-python3-runtime==4.11.1

anyio==3.7.1

appdirs==1.4.4

argon2-cffi==23.1.0

argon2-cffi-bindings==21.2.0

ast_decompiler==0.7.0

astatine==0.3.3

astor==0.8.1

astpretty==3.0.0

astroid==2.15.8

asttokens==2.4.0

async-lru==2.0.4

attrs==23.1.0

autoflake==1.7.8

autoviz==0.1.730

aws-secretsmanager-caching==1.1.1.5

awscli==1.32.37

Babel==2.12.1

backcall==0.2.0

bandit==1.7.7

beautifulsoup4==4.12.2

black==22.12.0

bleach==6.0.0

bokeh==2.4.3

boto3==1.34.37

botocore==1.34.37

cachetools==5.3.2

catboost==1.2.1

category-encoders==2.6.2

certifi==2023.7.22

cffi==1.15.1

charset-normalizer==3.2.0

click==8.1.7

cloudpickle==2.2.1

cmaes==0.10.0

cognitive-complexity==1.3.0

colorama==0.4.4

colorcet==3.0.1

colorlog==6.7.0

colour==0.1.5

comm==0.1.4

contourpy==1.1.0

coverage==6.5.0

cryptography==41.0.3

cycler==0.11.0

Cython==3.0.2

daal==2023.2.1

daal4py==2023.2.1

dacite==1.8.1

darglint==1.8.1

dash==2.13.0

dash-auth==2.0.0

dash-bootstrap-components==1.4.2

dash-core-components==2.0.0

dash-cytoscape==0.3.0

dash-html-components==2.0.0

dash-table==5.0.0

dash-testing-stub==0.0.2

dask==2023.5.0

databricks-cli==0.17.7

debugpy==1.6.7.post1

decorator==5.1.1

deepchecks==0.17.4

defusedxml==0.7.1

deprecation==2.1.0

dill==0.3.7

distlib==0.3.8

distributed==2023.5.0

dlint==0.14.1

doc8==0.11.2

docformatter==1.7.5

docker==6.1.3

docutils==0.16

domdf-python-tools==3.8.0.post2

dtreeviz==2.2.2

eli5==0.13.0

emoji==2.8.0

entrypoints==0.4

eradicate==2.3.0

evidently==0.2.8

exceptiongroup==1.1.3

executing==1.2.0

explainerdashboard==0.4.3

fairlearn==0.7.0

fastapi==0.103.1

fastjsonschema==2.18.0

ffmpy==0.3.1

filelock==3.12.3

flake8==4.0.1

flake8-2020==1.6.1

flake8-aaa==0.17.0

flake8-annotations==2.9.1

flake8-annotations-complexity==0.0.8

flake8-annotations-coverage==0.0.6

flake8-bandit==3.0.0

flake8-black==0.3.6

flake8-blind-except==0.2.1

flake8-breakpoint==1.1.0

flake8-broken-line==0.4.0

flake8-bugbear==22.12.6

flake8-builtins==1.5.3

flake8-class-attributes-order==0.1.3

flake8-coding==1.3.2

flake8-cognitive-complexity==0.1.0

flake8-commas==2.1.0

flake8-comments==0.1.2

flake8-comprehensions==3.14.0

flake8-debugger==4.1.2

flake8-django==1.4

flake8-docstrings==1.7.0

flake8-encodings==0.5.1

flake8-eradicate==1.4.0

flake8-executable==2.1.3

flake8-expression-complexity==0.0.11

flake8-fixme==1.1.1

flake8-functions==0.0.8

flake8-functions-names==0.4.0

flake8-future-annotations==0.0.5

flake8-helper==0.2.2

flake8-isort==4.2.0

flake8-literal==1.4.0

flake8-logging-format==0.9.0

flake8-markdown==0.3.0

flake8-mutable==1.2.0

flake8-no-pep420==2.7.0

flake8-noqa==1.4.0

flake8-pie==0.16.0

flake8-plugin-utils==1.3.3

flake8-polyfill==1.0.2

flake8-pyi==22.11.0

flake8-pylint==0.2.1

flake8-pytest-style==1.7.2

flake8-quotes==3.3.2

flake8-rst-docstrings==0.2.7

flake8-secure-coding-standard==1.4.1

flake8-slots==0.1.6

flake8-string-format==0.3.0

flake8-tidy-imports==4.10.0

flake8-typing-imports==1.12.0

flake8-use-fstring==1.4

flake8-use-pathlib==0.3.0

flake8-useless-assert==0.4.4

flake8-variables-names==0.0.6

flake8-warnings==0.4.0

flake8_simplify==0.21.0

Flask==2.2.3

flask-simplelogin==0.1.2

Flask-WTF==1.1.1

fonttools==4.42.1

frozenlist==1.4.0

fs==2.4.16

fsspec==2023.9.0

fugue==0.8.6

fugue-sql-antlr==0.1.6

future==0.18.3

gevent==23.9.0.post1

gitdb==4.0.10

GitPython==3.1.34

gradio==3.42.0

gradio_client==0.5.0

graphviz==0.20.1

greenlet==2.0.2

grpcio==1.57.0

gunicorn==20.1.0

h11==0.14.0

holoviews==1.14.9

htmlmin==0.1.12

httpcore==0.17.3

httpx==0.24.1

huggingface-hub==0.16.4

hvplot==0.7.3

hyperopt==0.2.7

hypothesis==6.97.1

hypothesmith==0.1.9

idna==3.4

ImageHash==4.3.1

imageio==2.31.3

imagesize==1.4.1

imbalanced-learn==0.11.0

importlib-metadata==5.2.0

importlib-resources==6.0.1

iniconfig==2.0.0

interpret==0.4.4

interpret-core==0.4.4

ipykernel==6.25.2

ipython==7.34.0

ipython-genutils==0.2.0

ipywidgets==7.8.1

isort==5.13.2

itsdangerous==2.1.2

jedi==0.19.0

Jinja2==3.1.2

jmespath==1.0.1

joblib==1.3.2

json5==0.9.14

jsonpickle==3.0.2

jsonschema==4.19.0

jsonschema-specifications==2023.7.1

jupyter==1.0.0

jupyter-console==6.6.3

jupyter-dash==0.4.2

jupyter-events==0.7.0

jupyter-lsp==2.2.0

jupyter-server==1.24.0

jupyter_client==7.4.9

jupyter_core==5.3.1

jupyter_server_terminals==0.4.4

jupyterlab==4.0.5

jupyterlab-flake8==0.7.1

jupyterlab-pygments==0.2.2

jupyterlab-widgets==1.1.7

jupyterlab_server==2.24.0

kaleido==0.2.1

kiwisolver==1.4.5

kmodes==0.12.2

lark-parser==0.12.0

lazy-object-proxy==1.10.0

lazy_loader==0.3

libcst==0.4.10

lightgbm==4.1.0

lime==0.2.0.1

linkify-it-py==2.0.2

llvmlite==0.40.1

locket==1.0.0

lxml==4.9.3

m2cgen==0.10.0

Mako==1.2.4

Markdown==3.4.4

markdown-it-py==3.0.0

MarkupSafe==2.1.3

matplotlib==3.7.2

matplotlib-inline==0.1.6

mccabe==0.6.1

mdit-py-plugins==0.4.0

mdurl==0.1.2

mistune==3.0.1

mlflow==1.30.1

mlxtend==0.22.0

moto==4.2.2

mr-proper==0.0.7

msgpack==1.0.5

multimethod==1.9.1

multiprocess==0.70.15

mypy-extensions==1.0.0

natsort==8.4.0

nbclassic==1.0.0

nbclient==0.8.0

nbconvert==7.8.0

nbformat==5.9.2

nest-asyncio==1.5.7

networkx==3.1

nltk==3.8.1

notebook==6.5.6

notebook_shim==0.2.3

numba==0.57.1

numpy==1.23.5

nvidia-ml-py==12.535.133

nvitop==1.3.2

oauthlib==3.2.2

optuna==3.3.0

orjson==3.9.5

outcome==1.2.0

overrides==7.4.0

oyaml==1.0

packaging==21.3

pandas==2.0.3

pandas-dq==1.28

pandas-vet==0.2.3

pandocfilters==1.5.0

panel==0.14.4

param==1.13.0

parso==0.8.3

partd==1.4.1

pathspec==0.9.0

patsy==0.5.3

pbr==6.0.0

pep8-naming==0.12.1

percy==2.0.2

pexpect==4.8.0

phik==0.12.3

pickleshare==0.7.5

Pillow==10.0.0

pkg_resources==0.0.0

pkgutil_resolve_name==1.3.10

platformdirs==3.10.0

plotly==5.16.1

plotly-resampler==0.9.1

pluggy==1.3.0

pmdarima==2.0.3

polars==0.19.2

prometheus-client==0.17.1

prometheus-flask-exporter==0.22.4

prompt-toolkit==3.0.39

protobuf==4.24.2

psutil==5.9.5

psycopg2-binary==2.9.9

ptyprocess==0.7.0

pure-eval==0.2.2

py==1.11.0

py4j==0.10.9.7

pyamg==5.0.1

pyaml==23.9.2

pyarrow==13.0.0

pyasn1==0.5.0

pybetter==0.4.1

pycaret==3.0.4

pycln==1.3.5

pycodestyle==2.8.0

pycparser==2.21

pyct==0.5.0

pydantic==2.6.2

pydantic-settings==2.1.0

pydantic_core==2.16.3

pydocstyle==6.3.0

pydub==0.25.1

pyemojify==0.2.0

pyflakes==2.4.0

Pygments==2.16.1

PyJWT==2.8.0

pylint==2.17.7

PyMySQL==1.1.0

PyNaCl==1.5.0

pynndescent==0.5.10

PyNomaly==0.3.3

pyod==1.1.0

pyOpenSSL==23.2.0

pyparsing==3.0.9

PySocks==1.7.1

pytest==7.4.1

pytest-cov==3.0.0

pytest-sugar==0.9.7

python-dateutil==2.8.2

python-dev-tools==2022.5.27

python-dotenv==1.0.1

python-json-logger==2.0.7

python-multipart==0.0.6

python-utils==3.7.0

pytz==2022.7.1

pyupgrade==2.38.4

pyviz_comms==3.0.0

PyWavelets==1.4.1

PyYAML==6.0.1

pyzmq==23.2.1

qpd==0.4.4

qtconsole==5.4.4

QtPy==2.4.0

querystring-parser==1.2.4

ray==2.6.3

referencing==0.30.2

regex==2023.8.8

removestar==1.5

requests==2.31.0

responses==0.23.3

restructuredtext-lint==1.4.0

retrying==1.3.4

rfc3339-validator==0.1.4

rfc3986-validator==0.1.1

rich==13.7.0

rpds-py==0.10.2

rsa==4.7.2

s3transfer==0.10.0

SALib==1.4.7

schemdraw==0.15

scikit-base==0.5.1

scikit-image==0.21.0

scikit-learn==1.2.2

scikit-learn-intelex==2023.2.1

scikit-optimize==0.9.0

scikit-plot==0.3.7

scipy==1.10.1

seaborn==0.12.2

selenium==4.2.0

semantic-version==2.10.0

Send2Trash==1.8.2

setuptools-scm==7.1.0

shap==0.42.1

six==1.16.0

skope-rules==1.0.1

sktime==0.22.0

slicer==0.0.7

smmap==5.0.0

sniffio==1.3.0

snowballstemmer==2.2.0

sortedcontainers==2.4.0

soupsieve==2.5

Sphinx==4.5.0

sphinxcontrib-applehelp==1.0.4

sphinxcontrib-devhelp==1.0.2

sphinxcontrib-htmlhelp==2.0.1

sphinxcontrib-jsmath==1.0.1

sphinxcontrib-qthelp==1.0.3

sphinxcontrib-serializinghtml==1.1.5

SQLAlchemy==1.4.49

sqlglot==18.2.0

sqlparse==0.4.4

ssort==0.12.3

stack-data==0.6.2

starlette==0.27.0

statsforecast==1.6.0

statsmodels==0.14.0

stdlib-list==0.10.0

stevedore==5.1.0

tabulate==0.9.0

tangled-up-in-unicode==0.2.0

tbats==1.1.3

tbb==2021.10.0

tblib==2.0.0

tenacity==8.2.3

tensorboardX==2.6.2.2

termcolor==2.4.0

terminado==0.17.1

textblob==0.17.1

threadpoolctl==3.2.0

tifffile==2023.7.10

tinycss2==1.2.1

tokenize-rt==4.2.1

toml==0.10.2

tomli==2.0.1

tomlkit==0.12.3

toolz==0.12.0

tornado==6.3.3

tox==3.28.0

tox-travis==0.13

tqdm==4.66.1

trace-updater==0.0.9.1

traitlets==5.9.0

treeinterpreter==0.2.3

triad==0.9.1

trio==0.22.2

trio-websocket==0.10.3

tsdownsample==0.1.2

tune-sklearn==0.4.6

typeguard==4.1.5

typer==0.4.2

types-PyYAML==6.0.12.11

typing-inspect==0.9.0

typing_extensions==4.7.1

tzdata==2024.1

uc-micro-py==1.0.2

umap-learn==0.5.3

untokenize==0.1.1

urllib3==1.26.16

urllib3-secure-extra==0.1.0

uvicorn==0.23.2

virtualenv==20.25.0

virtualenv-clone==0.5.7

visions==0.7.5

waitress==2.1.2

wcwidth==0.2.6

webencodings==0.5.1

websocket-client==1.6.2

websockets==11.0.3

wemake-python-styleguide==0.16.1

Werkzeug==2.2.3

widgetsnbextension==3.6.6

wordcloud==1.9.2

wrapt==1.16.0

wsproto==1.2.0

WTForms==3.0.1

wurlitzer==3.0.3

xgboost==1.7.6

xlrd==2.0.1

xmltodict==0.13.0

xxhash==3.3.0

xyzservices==2023.7.0

ydata-profiling==4.6.4

yellowbrick==1.5

zict==3.0.0

zipp==3.16.2

zope.event==5.0

zope.interface==6.0

OS

ubuntu

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.

Hi @ronfisher21 ,

can you please test with the latest version of the package? We have just double checked, and we are able to extract the information of the compare report with no errors.

Example of the code used:
`import pandas as pd
from ydata_profiling import ProfileReport

og_df = pd.read_csv('sample_data/california_housing_train.csv')
df = pd.read_csv('sample_data/california_housing_test.csv')

report = ProfileReport(og_df, title='Train dataset houses')
report_test = ProfileReport(df, title='Test dataset houses')

compare = report.compare(report_test)

#using a variable to store the Json output
compare_json=compare.to_json()

#storing the json output as a file
compare.to_file('compare.json')`

Attach you can see the json.

compare.json