Replace load_boston with alternate regression dataset
gaugup opened this issue · comments
Describe the bug
As noted in https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html the Boston housing dataset is being deprecated due to a significant ethical concern:
Warning: The Boston housing prices dataset has an ethical problem: as investigated in [1], the authors of this dataset engineered a non-invertible variable “B” assuming that racial self-segregation had a positive impact on house prices [2]. Furthermore the goal of the research that led to the creation of this dataset was to study the impact of air quality but it did not give adequate demonstration of the validity of this assumption.
The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning.
Sklearn suggests that either the California housing dataset or the Ames housing dataset are reasonable alternative datasets [https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html]. We should change references in interpret-community accordingly.
To Reproduce
Steps to reproduce the behavior:
- Search for load_boston in interpret-community repository
Expected behavior
Sklearn suggests that either the California housing dataset or the Ames housing dataset are reasonable alternative datasets [https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html]. We should change references in interpret-community accordingly.