OpenBioLink / ThoughtSource

A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Datasets: MedQA, MedMCQA, PubmedQA

matthias-samwald opened this issue · comments

The CoTs for these datasets come from Lievin et al 2022. https://arxiv.org/abs/2207.08143

Just a minor observation of the MedMCQA source data (not an issue pertaining to our code): in the gold-standard CoTs, certain citations re-appear a lot (e.g. "Ref Harrison20th edition pg 2456" appears over >60 times). I'm pretty sure that some of these citations are not correct, since it appears in a wide variety of contexts.