deepchecks / deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

Home Page:https://docs.deepchecks.com/stable

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Dataset select method sometimes mutates original columns list

PushaBe opened this issue · comments

Describe the bug
This code mutates original columns sequence if they were provided as a mutable Python sequence (e.g. List)

To Reproduce
Steps to reproduce the behavior:

  1. Define the features as a Python List of string, define label as a string
  2. Provide it to the Dataset class constructor
  3. Create an instance of ConflictingLabels check and provide features list to the columns parameter of the constructor (features without label)
  4. Run the check (under the hood it will invoke Dataset.select method with keep_label = True parameters)
  5. Check the original features list - it will contain a label column too

Expected behavior
I think it shouldn't mutate the original list because it's quite unexpected. Even if, indeed, I didn't provide all the columns to the check (forgot about label), I wouldn't expect it to mutate my list by adding the label to it instead of throwing an error. I used it as a part of Data Integrity Suite and was really surprised by that behaviour.

Screenshots
image

Environment (please complete the following information):

  • OS: CentOS Linux
  • Python Version: 3.7.16
  • Deepchecks Version: 0.13.1

Hey @PushaBe, thanks for letting us know! this is definitely not a desired behavior and we will try and fix in ASAP

@noamzbr you can assign this to me!!

@noamzbr I think this has been fixed 2 weeks back by @yromanyshyn.

solved by #2544