Leveraging Natural Language Processing Techniques to Improve Manual Game Testing

Viggiato De Almeida, Markos

doi:doi:10.7939/r3-we6p-6k21

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

291 views
214 downloads

Leveraging Natural Language Processing Techniques to Improve Manual Game Testing

Author / Creator

Viggiato De Almeida, Markos
The gaming industry has experienced a sharp growth in recent years, surpassing other popular entertainment segments, such as the film industry. With the ever-increasing scale of the gaming industry and the fact that players are extremely difficult to satisfy, it has become extremely challenging to develop a successful game. In this context, the quality of games has become a critical issue. Game testing is a widely-performed activity to ensure that games meet the desired quality criteria. However, despite recent advancements in test automation, manual game testing is still prevalent in the gaming industry, with test cases often described in natural language only and consisting of one or more test steps that must be manually performed by the Quality Assurance (QA) engineer (i.e., the tester). This makes game testing challenging and costly. Issues such as redundancy (i.e., when different test cases have the same testing objective) and incompleteness (i.e., when test cases miss one or more steps) become a bigger concern in a manual game testing scenario. In addition, as games become bigger and the number of required test cases increases, it becomes impractical to execute all test cases in a scenario with short game release cycles, for example.

Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.

In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and clustering techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions
to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.

In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high
performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players’ sentiments about games and guide testing based on the
game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance.

The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.
Subjects / Keywords
Graduation date

Spring 2023
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-we6p-6k21
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Electrical and Computer Engineering
Specialization
- Software Engineering and Intelligent Systems
Supervisor / co-supervisor and their department(s)
- Bezemer, Cor-Paul (Electrical & Computer Engineering)