Exploring Preferential Label Smoothing for Neural Network based classifiers

Goyal, Paritosh

doi:doi:10.7939/r3-beb4-fh16

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

408 views
216 downloads

Exploring Preferential Label Smoothing for Neural Network based classifiers

Author / Creator

Goyal, Paritosh
Overfitting is a phenomenon when a machine learning system learns the patterns in training data so well that it starts to inauspiciously affect the model performance on unseen data. In practice, machine learning systems that overfit are not deployable rather systems that generalize well and do well on both train and test data are deployed. One of the strategy used to prevent overfitting and help models generalize well is regularization. For neural networks based machine learning systems, regularization can be applied using any of the neural network architecture, the loss function and the training algorithm.
One of the losses used to train the neural network based classifiers is Cross Entropy Loss (CE). When using such a loss, the loss for a given data sample is computed solely using that sample’s ground truth label, i.e., keeping full concentration on the ground truth label and neglecting the effect of other labels, this makes the classifier overconfident for the data sample on one ground truth label and degrades generalization. One method of regularization is to take some of the concentration (called Smoothing Ratio (SR)) from the data sample’s ground truth label and distribute it uniformly among all the other labels. This method is called label smoothing and is found to be quite effective. For brevity, we call the approach of distributing SR uniformly as Uniform Label Smoothing (ULS).
In this work, we explore what happens if we distribute the SR to the non-ground truth labels based on how closely they are related to the ground truth label. The relation between the labels may come from an external source-learnt from external data or provided by a subject matter expert. We call this approach of distributing the SR based on relation between labels as Preferential Label Smoothing (PLS). PLS represents a more unified approach of doing label smoothing because even ULS is a special case of PLS. Previous works on ULS suggest that ULS becomes redundant when the number of labels is high. Consider the case when there are only two labels (i.e., binary classification) then there is no point of using PLS. So, we investigate the effects of PLS when the number of labels in the dataset is high. Another gap that we study in this work is about the effects of PLS and ULS on the training dynamics and how are training dynamics different from when no label smoothing is used. We demonstrate our study on image classification and text classification. Experimenting on text classification fills in one more gap in the previous works, that ULS was not studied in the context of text classification.
Subjects / Keywords
Graduation date

Fall 2022
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-beb4-fh16
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Zaiane, Osmar (Computing Science)
- Trabelsi, Amine (Computer Science Department, University of Sherbrooke)