Adaptation, Sensitivity, and Introspection: Investigating the Capabilities of LLMs as Hypernetworks

By: Christopher Pondoc, Joseph Guman, and Joseph O'Brien

Abstract

Much work has been done to evaluate Large Language Models (LLMs) on common sense reasoning tasks, such as the Winograd Schema Challenge (Levesque et al., 2012). However, the underlying embeddings these models learn are static with respect to language, causing LLMs to perform poorly on benchmarks such as WinoDict that test lexical semantic word shifts (Eisenschlos et al., 2022b). These shifts occur when a word changes meaning over time or when a brand new word emerges. While retraining the entire model is a natural approach to solving this problem, carrying out this operation is expensive in terms of both computation and time (Patterson et al., 2021) and could potentially lead to model degradation. Hyper-networks (Ha et al., 2017), or neural networks that learn parameters for another model, have shown promise in editing language model behavior without the need for retraining (Cao et al., 2021; Kim and Jeong, 2021). Thus, we sought to train a separate LLM hyper-network to learn and predict GPT-2 (Radford et al., 2019) token embeddings based on the token’s definition. After achieving poor results on both the WinoDict and Children’s Book Test (CBT) benchmarks, we investigated the sensitivity of GPT-2’s embedding layer further (Hill et al., 2016). Using several “forcing” experiments, we found that even slight perturbations to an embedding can lead to drastic differences in model performance on various downstream tasks. Overall, we argue that a path moving forward involves training transformer models to be embedding agnostic or adding two-pronged capabilities of language and embedding space understanding. Our code for the project milestone is open-source.

cpondoc / word-shifts

Adaptation, Sensitivity, and Introspection: Investigating the Capabilities of LLMs as Hypernetworks

Abstract

About

Languages