Usage
  • 103 views
  • 101 downloads

Discovering Reward Functions for Language Models

  • Author / Creator
    Hao, Yongchang
  • Language models are a fundamental component of natural language processing (NLP) systems. Numerous successful deployments of modern artificial intelligence systems are based on language models, including GPT-4 and ChatGPT. In practice, they are often trained with the teacher forcing objective, where the goal is to predict the next token given the context. One problem with this objective is that it suffers from inconsistency between training and inference, which leads to compounding errors during the inference phase. Reinforcement learning (RL) is considered a promising approach to address these issues. However, it is usually difficult to design reward functions for language models. Previous attempts at designing reward functions are typically sparse and handcrafted for a specific task only.

    In this thesis, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with the teacher forcing objective. Our work shows that under a common assumption, language models are trained to implicitly capture the rewards. This simple connection allows us to derive a reward function from any pre-trained language models. The derived reward functions can be used to conduct reinforcement learning for language modeling. We conduct experiments on different sequence-to-sequence tasks, including dialogue and paraphrase generation. Empirical results show that the model trained with our reward function outperforms self-training and reward regression methods on both tasks, confirming the effectiveness of our derivation.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-a49r-k971
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.