Automatic topic classification of test cases using text mining at an Android smartphone vendor

Shimagaki, J.; Kamei, Y.; Ubayashi, N.; Hindle, Abram

doi:doi:10.7939/r3-pmyb-2486

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Conference Papers (Computing Science)

Usage

106 views
99 downloads

Automatic topic classification of test cases using text mining at an Android smartphone vendor

Author(s) / Creator(s)
Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups.
Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling.
Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company.
Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label.
Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.
Date created

2018-01-01
Subjects / Keywords
Type of Item

Conference/Workshop Presentation
DOI

https://doi.org/10.7939/r3-pmyb-2486
License

Attribution 4.0 International

Language
- English