[BUG] Dataset select method sometimes mutates original columns list
PushaBe opened this issue · comments
Describe the bug
This code mutates original columns sequence if they were provided as a mutable Python sequence (e.g. List)
To Reproduce
Steps to reproduce the behavior:
- Define the features as a Python List of string, define label as a string
- Provide it to the Dataset class constructor
- Create an instance of ConflictingLabels check and provide features list to the columns parameter of the constructor (features without label)
- Run the check (under the hood it will invoke Dataset.select method with keep_label = True parameters)
- Check the original features list - it will contain a label column too
Expected behavior
I think it shouldn't mutate the original list because it's quite unexpected. Even if, indeed, I didn't provide all the columns to the check (forgot about label), I wouldn't expect it to mutate my list by adding the label to it instead of throwing an error. I used it as a part of Data Integrity Suite and was really surprised by that behaviour.
Environment (please complete the following information):
- OS: CentOS Linux
- Python Version: 3.7.16
- Deepchecks Version: 0.13.1
Hey @PushaBe, thanks for letting us know! this is definitely not a desired behavior and we will try and fix in ASAP
@noamzbr you can assign this to me!!
@noamzbr I think this has been fixed 2 weeks back by @yromanyshyn.
solved by #2544