Human level performance?

Question

Human level performance?

rodrigonogueira4 opened this issue 4 years ago · comments

Rodrigo Frassetto Nogueira commented 4 years ago

Hi, first of all, thanks for releasing this great dataset!

In the abstract you wrote:
"on every one of the 57 tasks, the best models still need substantial improvements before they can reach human-level accuracy",
but I could not find human performance numbers in the paper. Do you plan to include them anytime soon?

Thanks!

Dan Hendrycks · Answer 1 · Tue Sep 29 2020 01:22:13 GMT+0800 (China Standard Time)

We have changed the abstract to say "expert-level accuracy." For most nearly all tasks this is >= 90%. Hence an average of score of 90% should eventually be possible.
Human-level performance would vary substantially from human-to-human. I surmise that most high school graduates would get <= 40%. Colleges that have a broad core curriculum (e.g., Columbia, UChicago) might have graduates that get <= 60%.

Rodrigo Frassetto Nogueira · Answer 2 · Tue Sep 29 2020 01:36:27 GMT+0800 (China Standard Time)

Great, thanks for the prompt reply!

Katie Collins · Answer 3 · Fri Dec 30 2022 01:24:20 GMT+0800 (China Standard Time)

Hi @hendrycks and team, thanks for releasing such an awesome dataset!! I was wondering if there was any publicly-releasable MTurk/crowdsourced worker performance from the paper? I.e., the paper mentions: "Unspecialized humans from Amazon Mechanical Turk obtain 34.5% accuracy on this test" <--- are there details on the experiment(s) used to get this number? How many participants were run? And did each participant answer questions for just one topic or many? No worries if this is not available, just wanted to check. Thank you!

Dan Hendrycks · Answer 4 · Fri Dec 30 2022 03:21:23 GMT+0800 (China Standard Time)

Since there were so many questions, we just batched the questions and had people answer them. To incentivize people not to randomly guess, we didn't let them submit answers right away and we showed them the correct answers after they answered the few questions.

…

On Thu, Dec 29, 2022 at 9:24 AM Katie Collins ***@***.***> wrote: Hi @hendrycks <https://github.com/hendrycks> and team, thanks for releasing such an awesome dataset!! I was wondering if there was any publicly-releasable MTurk/crowdsourced worker performance from the paper? I.e., the paper mentions: "Unspecialized humans from Amazon Mechanical Turk obtain 34.5% accuracy on this test" <--- are there details on the experiment(s) used to get this number? How many participants were run? And did each participant answer questions for just one topic or many? No worries if this is not available, just wanted to check. Thank you! — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZBITXUJRGBCAVVCKW5HJ3WPXCM7ANCNFSM4R4YFMHQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Katie Collins · Answer 5 · Fri Dec 30 2022 12:35:33 GMT+0800 (China Standard Time)

Got it, thanks for the speedy response @hendrycks ! That's useful to know.