marcotcr / checklist

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scale checklist to classical ML models

kumar-abhishek opened this issue · comments

Thanks for creating an awesome framework for behavioral testing. While I am still diving into this framework, as I understand, this framework applies really well to the NLP models. I am really interested in knowing if this can scale to classical machine learning models like decision trees or logistic regression with tabular data.

While it seems a bit hard to generalize to me, I guess I just wanted to know your thoughts on this and see if you have some suggestions on how one can use similar tools to generate large number of test cases under various categories.

I actually started this project with tabular data, and came up with a variety of interesting invariance & monotonicity tests for various tasks. The difficulty there was generalizing from each task into the others, a difficulty that is less pronounced in language because of linguistic phenomena (e.g. names, verbs, negation) which manifest in pretty much every NLP task.

The concepts of MFT, INV and DIR translate directly to tabular data testing, but the capabilities I listed obviously have to do with language.