Make predictions in a new or held out dataset

Question

Make predictions in a new or held out dataset

raamana opened this issue 4 years ago · comments

Pradeep Reddy Raamana commented 4 years ago

Ability to input a new dataset, from a different site or dataset or country, and use the best model to report performance on this dataset

Or an option to specify attribute-based criterion to hold a certain subset out completely to report performance

Pradeep Reddy Raamana · Answer 1 · Mon Jan 20 2020 04:01:10 GMT+0800 (China Standard Time)

An obvious issue to be solved is the definition of what the best model is — one parameter combination is only evaluated once, and a simple numerical comparison of accuracy isn’t a good/robust way pick it.

Best model could be defined by the Param combination that was most frequently selected over N>100 reps of the inner CV loop (I report it for user FYI), but often there are many within the same freq range of 30-40%, and we could employ some non-parametric stats there to pick one!

CLI option could be —report_on