Usage
  • 76 views
  • 129 downloads

Leveraging Foundation Models for Video Game Quality Assurance

  • Author / Creator
    Taesiri, Mohammad Reza
  • The video game industry has become a powerhouse in the global entertainment economy. Creating engaging, high-quality games demands intricate development processes and significant resources. As projects grow in complexity and scale, developers often grapple with demanding schedules, tight deadlines, and the risk of burnout. These pressures highlight the need for more efficient development strategies, with quality assurance (QA) emerging as a critical area for optimization.

    Artificial Intelligence (AI) has the potential to address these challenges by enhancing the game QA processes in large gaming companies. Specifically, foundation models - large pre-trained AI models - offer promising applications to improve these processes. Exploring novel uses of these advanced AI models could reveal their potential and limitations in optimizing game development workflows, potentially alleviating some of the industry's pressing issues and facilitating the creation of high-quality, engaging games.

    In this thesis, my goal is to improve video game testing processes by leveraging foundation models to ensure the final product reaches a desirable quality. I explore new opportunities that foundation models bring to game testing, from searching for instances of game bugs within video repositories to assisting human testers in catching bugs, through three studies:

    First, I investigate the utility of image-text foundation models in retrieving gameplay videos. In this study, I create a video search engine designed to help developers efficiently search video repositories for examples of video game bugs using textual descriptions. For example, developers can find all instances of a bug by using a textual description of the bug, such as "a horse flying in the air". This study lays the groundwork for AI-based game QA processes, with results demonstrating significant potential.

    Next, I introduce GlitchBench, a benchmarking dataset of video game glitches and anomalies designed to assess state-of-the-art large multimodal models, such as GPT-4V, in detecting and understanding game bugs. This extensive dataset includes a wide range of images depicting various glitches, sourced from both online platforms and synthetic sets created within the Unity game engine. GlitchBench includes both common and rare glitches encountered in the video game quality assurance process. The findings from this study highlight both the promise and limitations of existing models, particularly in unusual and rare cases.

    Lastly, I introduce VideoGameBunny, a large multimodal model specifically trained for video game content, accompanied by a dataset of 389,565 image-instruction pairs. My analysis demonstrates that VideoGameBunny outperforms much larger models in video game understanding tasks while using 4.2 times fewer parameters. This result underscores the effectiveness and promise of using a high-quality dataset to improve models’ understanding of video games, thus making them more effective in the game QA process.

    Future work should focus on enhancing the generalization and robustness of AI models in the gaming context, particularly through better integration of vision and language components. This integration could be achieved using either early or late fusion methods. For late fusion methods, where two pre-trained models are connected, better alignment between these components can be achieved through improved training data and strategies. Alternatively, early fusion techniques, which involve training both components simultaneously to enhance their integration, can overcome many issues that existing models have.

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-86t1-7t45
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.