Learning Sparse Representations for Computer Vision Applications

Foroughi, Homa

doi:doi:10.7939/R3H41K01S

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

369 views
775 downloads

Learning Sparse Representations for Computer Vision Applications

Author / Creator

Foroughi, Homa
At the core of many computer vision methods lies the question of how to represent data. Representing the data in a meaningful way, which highlights its most useful properties, can significantly affect the performance of any vision-based application. Traditional systems are heavily reliant on hand-designed representations that are mostly domain-specific and also need significant amounts of domain knowledge and human effort. Recently, there has been much research in learning representation from data and one of successful approaches is the sparse representation, which tries to represent data as a linear combination of a few elements of a basis or dictionary. A good sparse representation of an image is expected to have high fidelity to the observed image content and reveal its underlying structure and semantic information at the same time. In this thesis, we address the problem of how to learn such representation or dictionary from training images, particularly for crowd counting, image classification, and dimensionality reduction tasks. Counting pedestrians in videos is a topic of great interest in areas such as visual surveillance, public resource management and security purposes. Crowd counting could be a challenging task due to severe occlusions, scene perspective distortions and diverse crowd distributions. In this thesis, we propose two methods for crowd counting based on compressed sensing and sparse representation theories, each of which is capable of resolving some of the aforementioned issues. Firstly, we present a counting method based on image retrieval framework, and also introduce a compact global image descriptor using compressed sensing theory, to estimate the crowd count. Next, we propose a crowd counting method based on sparse representation-based classification and random projection. We adopt a semi-supervised elastic-net to provide a rich training set, that can span variations under testing conditions. By exploiting the sequential information of readily available vast quantity of unlabeled data, we are able to annotate a large portion of data with just a handful of labeled images. Experiments on crowd counting benchmark datasets demonstrate the effectiveness and reliability of proposed methods, especially in large-scale datasets. Image classification based on visual content is a challenging task, mainly because there is usually large amount of intra-class variability, arising from illumination and viewpoint variations, occlusion and corruption. In addition, many real-world vision applications are faced with the problem of high-dimensional data and small number of training samples. To address all these issues, we propose a joint learning framework, in which the subspace projection matrix, the dictionary and sparse coefficients are learned simultaneously. By incorporating competent constraints such as low-rank, incoherence and neighborhood preservation, we are able to learn discriminative and robust sparse representations of images, especially for challenging classification scenarios. Experimental results on several benchmark datasets verify the superior performance of our method for object classification of small datasets, which include considerable amount of different kinds of variation. Feature selection is another solution to deal with high-dimensional data, and recently sparsity constraints have been utilized to select a subset of features. We propose a feature selection method based on the decision rule of dictionary learning, and integrate low-rank matrix recovery, reconstruction residuals, and row-sparsity constraints into the framework. As a result, the proposed method selects optimal subset of features simultaneously, and provides well-separated classes in the reduced space. Our method is capable of selecting discriminative features, even when the data are contaminated due to occlusion, illumination or pose variations and corruption. Extensive experiments on benchmark datasets verify the superior performance of the proposed method for feature selection, image/video classification and counting specific populations of tumor cells in microscopic images.
Subjects / Keywords
Graduation date

Spring 2017
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3H41K01S
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Ray, Nilanjan (Computing Science)
- Zhang, Hong (Computing Science)
Examining committee members and their departments
- Boulanger, Pierre (Computing Science)
- Jagersand, Martin (Computing Science)
- Jepson, Allan (Computer Science, University of Toronto)