hendrycks/test Issues
can not download
Updated 1Hrvatski
Updatedwhy ["top_logprobs"][-1]
Closed 1Dismatch dataset categories
UpdatedHuman level performance?
Closed 5Dataset size mismatched with paper
Closed 2
Measuring Massive Multitask Language Understanding | ICLR 2021