[Question] Code Synthesis eval functions?

Question

[Question] Code Synthesis eval functions?

TheExGenesis opened this issue a year ago · comments

My understanding is you didn't include the program synthesis datasets for IP reasons, but could you include the eval functions you used? Were the eval environments from the original repos hooked up to python functions to be called by seqio?

shayne-longpre · Answer 1 · Tue Mar 21 2023 03:13:40 GMT+0800 (China Standard Time)

@TheExGenesis Good question. We did not do any Program Synthesis evaluations, but that would be the right way to do it. I would suggest scaling up the Program Synthesis mixture rate and number of datasets to ensure a 3B sized model can perform well on those. It's a harder task and shares less in common with most of the other NLP tasks. We added it in training purely for even greater task diversity.

Francisco Carvalho · Answer 2 · Tue Mar 21 2023 18:14:35 GMT+0800 (China Standard Time)

@shayne-longpre Can I ask what eval functions you used instead?

shayne-longpre · Answer 3 · Tue Mar 21 2023 23:51:57 GMT+0800 (China Standard Time)

@TheExGenesis We were not approved to release the evaluation code unfortunately, but it should be simple to replicate. MMLU is the evaluation benchmark of 57 tasks we used most consistently.

Here is an example of a few-shot MMLU prompt we have templatized for one of the tests:

The following are multiple choice questions (with answers) about security studies.

What are the frameworks of analysis within which terrorism has been considered (as of 2020)?
(A) Competition between larger nations has resulted in some countries actively supporting terrorist groups to undermine the strength of rival states. Terrorist networks are extended patronage clubs maintained and paid for by their donor states and are conceptualised as being like state actors, to be dealt with using military force. (B) Globalization has enabled the internationalization of terrorist activities by opening up their operational space, although coordination is still managed from a geographical base. This suggests that terrorist groups are nationally structured which means that terrorism cannot be considered in terms of a war to be defeated militarily without having serious implications on the indigenous population. (C) Terrorism can be viewed as a problem to be resolved by military means (war on terrorism), by normal police techniques (terrorism as crime), or as a medical problem with underlying causes and symptoms (terrorism as disease). (D) Terrorism is viewed as a criminal problem. The criminalization of terrorism has two important implications. Firstly, it suggests that terrorism can be eradicated - terrorists can be caught and brought to trial by normal judicial proceedings thereby removing the threat from society - and secondly, it suggests that preventative crime techniques are applicable to prevent its development.
Answer:(C)

Which of the following is the best lens through which to investigate the role of child soldiers?
(A) Child soldiers are victims of combat that need re-education and rehabilitation. (B) Children and their mothers are not active subjects in warfare and are best considered as subjects in the private sphere. (C) Children are most often innocent bystanders in war and are best used as signifiers of peace. (D) Children have political subjecthood that is missed when they are considered as passive victims of warfare.
Answer:(D)

How can we best describe the relationship between the state-centric approach and the concept of human security?
(A) There are such wide divisions within the human security framework regarding the nature of threats and referent objects that no widely applicable comparisons between state-centric approaches and human security can be drawn. (B) By adopting the framework of human security, the limitations of the realist state-centric approach become evident. Whilst human security defines the referent object as the person or population, state-centric approaches prioritise the security of the state, de-prioritizing the pursuit of human security. (C) The state-centric approach to security is a faction of human security, usually defined within the broad school of human security. By being state-centric this approach prioritises the individual as the referent object in security studies. (D) Both the state-centric and human-centric approaches to security are mutually exclusive and offer a sufficient analytic framework with which to understand the international security system. It is therefore the role of security analysts to determine which of these substantial concepts is correct, and which should be discarded.
Answer:(B)

In order to become securitized, a threat must be presented in which of these ways?
(A) As an existential threat that requires immediate and extraordinary action, posing a threat to the survival of the state or to societal security. (B) As requiring immediate and extraordinary action by the state, threatening the survival of a referent object and therefore warranting the use of measures not normally employed in the political realm. (C) As an urgent threat to the survival of the referent object, so serious that it legitimises the employment of extraordinary action in response. (D) As an urgent threat to the survival of the audience that requires extraordinary or emergency measures.
Answer:(C)

What distinguishes coercive diplomacy from military force?
(A) Compellence is another term for coercive diplomacy, but covering a narrower set of criteria; compellence covers those threats aimed at initiating adversary action. A threat to coerce a state to give up part of its territory would count as coercive diplomacy, as long as that threat proactively initiates action before reactive diplomacy is taken. (B) Coercive diplomacy constitutes the threats of limited force to induce adversary's incentive to comply with the coercer's demands. It is an influence strategy that is intended to obtain compliance: the use of force to defeat an opponent first does not count. It leaves an element of choice with the target to comply, or to continue. (C) Military force, or the threat of military force, utilises fear to achieve strategic objectives. Coercive diplomacy is differentiated from this approach, because it does not use fear as a tool for coercing an adversary. (D) Coercive diplomacy is employed to use force but to limit its effects on the international community. Coercive diplomacy is an aggressive strategy that is intended to obtain compliance through defeat. It does not leave an element of choice with the target, the target either being forced to comply or engage in conflict. It seeks to control by imposing compliance by removing any opportunity for negotiation or concession.
Answer:(B)

In what ways can the environment be linked to human insecurity?
(A) Human insecurity is an interchangeable concept with environmental insecurity; environmental change invariably undermines human security because its impact is always 'human' and acts as a constraining or facilitating factor that determines the extent of human development. Environmental change and conditions will therefore be the primary determinant of a person's or community's capacity to adapt to their surroundings. (B) The ways in which environmental change can threaten the welfare of the international system is dependent on the extensity of poverty as the key variable determining a population's reactive capability. Environmental change would have a negative impact if resources were available to adapt to environmental change to sustain their existing income levels. (C) In terms of the social determinants of insecurity, environmental change does not undermine human security in isolation; larger scale processes affect people's sensitivity to environmental changes and their capacity to adapt, whilst past processes shape present insecurities and ongoing processes shape future insecurities. (D) The concept of environmental human security is an essentially contested concept lacking empirical credibility of the ways in which specific environmental changes affect individuals or communities in particular times/ places and how this alters over a period of time. The lack of an agreed definition on what constitutes human security makes the possibility of developing a framework unlikely.
Answer:

Francisco Carvalho · Answer 4 · Wed Mar 22 2023 00:21:36 GMT+0800 (China Standard Time)

@shayne-longpre Oh I'm sorry, I meant to ask about the metric functions used for code tasks. Some language tasks use bleu, other rouge, others edit distance.

shayne-longpre · Answer 5 · Wed Mar 22 2023 04:10:33 GMT+0800 (China Standard Time)

@TheExGenesis Oh sorry I see your question. As we didn't evaluate code tasks, the metric wouldn't have mattered for us (we probably just put accuracy or rouge as a placeholder) since the training loss function is unchanged as for any task.

Sorry I can't be more helpful here.

Francisco Carvalho · Answer 6 · Wed Mar 22 2023 19:13:16 GMT+0800 (China Standard Time)

@shayne-longpre Are you saying you didn't use the code data for training? In that case, why include it in the dataset?

shayne-longpre · Answer 7 · Wed Mar 22 2023 22:14:56 GMT+0800 (China Standard Time)

@TheExGenesis We did use the code datasets in training, just as input-output text pairs, but not for evaluation (and as such didn't compute any metrics on it).

Francisco Carvalho · Answer 8 · Wed Mar 22 2023 22:52:45 GMT+0800 (China Standard Time)

@shayne-longpre Forgive a possibly dumb question but how do you calculate training loss on the text? Is it just masking tokens in the answer and advancing token-by-token while taking cross-entropy between prediction and ground truth?

shayne-longpre · Answer 9 · Wed Mar 22 2023 22:59:59 GMT+0800 (China Standard Time)

@TheExGenesis Yes exactly, training loss is computed the same for all tasks and as for pre-training.

Francisco Carvalho · Answer 10 · Wed Mar 22 2023 23:07:03 GMT+0800 (China Standard Time)

Thanks! I appreciate the helpful answers, you can close the issue if you'd like :)