The quest to ensure fairness, robustness, and utility in large language models (LMs) has led to a keen interest in understanding how different modifications to their inputs impact the model's behavior. In open-text generation tasks, evaluating these impacts is far from straightforward. Thus, the proposal of Contrastive Input Decoding (CID) by Gal Yona, Or Honovich, Itay Laish, and Roee Aharoni from Weizmann Institute, Tel Aviv University, and Google, respectively, comes as a welcome development.
Links to paper: abs: https://arxiv.org/abs/2305.07378
paper page: https://huggingface.co/papers/2305.07378
CID is a decoding algorithm designed to generate text based on two inputs. The text is likely given one input but unlikely given the other. This contrasting feature helps to highlight subtle differences in the model’s output for the two inputs in an easily understandable manner. CID is thus used to expose context-specific biases that are difficult to detect with standard decoding strategies. It also quantifies the impact of different input perturbations.
The sensitivity of large pre-trained language models to minor input perturbations, including those that humans would deem insignificant, presents a challenge. For instance, in a medical question such as “What happens if listeria is left untreated?”, the effect of specifying demographic information (e.g., “left untreated in men?” vs “left untreated in women?”) may not be clear.
CID was developed to address this issue by introducing a decoding strategy that accepts a regular input and a “contrastive” input. The objective is to generate sequences that are likely given the regular input but unlikely given the contrastive input. This highlights the differences in how the model treats these two inputs in an easily interpretable way.
CID uses a hyper-parameter λ that controls the degree of contrasting. Increasing λ can be used to surface differences that may otherwise be difficult to detect. The researchers demonstrated two applications for CID: surfacing context-specific biases in autoregressive LMs, and quantifying the effect of different input perturbations.
CID’s method of contrastive decoding uses an additional contrastive input to inform the generation. This allows the model to generate text that is likely under one input but less likely under the contrastive input. This is achieved by modifying the next-token distribution using the contrastive input, thereby highlighting how the model treats these two inputs differently.
In conclusion, Contrastive Input Decoding (CID) provides a means to understand and quantify the impact of input modifications on language models. By using a regular input and a contrastive input, it is possible to highlight subtle differences and biases in the model's output in a straightforward and interpretable manner. This is a significant step towards ensuring the fairness and robustness of large language models.