Download the full-sized PDF of Learning Sparse Representations for Computer Vision ApplicationsDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Learning Sparse Representations for Computer Vision Applications Open Access


Other title
Crowd Counting
Feature Selection
Object Classification
People Counting
Multi-Modal Dictionary Learning
Low-rank Learning
Sparse Representation
Compressed Sensing
Dictionary Learning
Dimensionality Reduction
Joint Optimization
Joint Dictionary Learning and Dimensionality Reduction
Type of item
Degree grantor
University of Alberta
Author or creator
Foroughi, Homa
Supervisor and department
Zhang, Hong (Computing Science)
Ray, Nilanjan (Computing Science)
Examining committee member and department
Jagersand, Martin (Computing Science)
Boulanger, Pierre (Computing Science)
Jepson, Allan (Computer Science, University of Toronto)
Department of Computing Science

Date accepted
Graduation date
2017-06:Spring 2017
Doctor of Philosophy
Degree level
At the core of many computer vision methods lies the question of how to represent data. Representing the data in a meaningful way, which highlights its most useful properties, can significantly affect the performance of any vision-based application. Traditional systems are heavily reliant on hand-designed representations that are mostly domain-specific and also need significant amounts of domain knowledge and human effort. Recently, there has been much research in learning representation from data and one of successful approaches is the sparse representation, which tries to represent data as a linear combination of a few elements of a basis or dictionary. A good sparse representation of an image is expected to have high fidelity to the observed image content and reveal its underlying structure and semantic information at the same time. In this thesis, we address the problem of how to learn such representation or dictionary from training images, particularly for crowd counting, image classification, and dimensionality reduction tasks. Counting pedestrians in videos is a topic of great interest in areas such as visual surveillance, public resource management and security purposes. Crowd counting could be a challenging task due to severe occlusions, scene perspective distortions and diverse crowd distributions. In this thesis, we propose two methods for crowd counting based on compressed sensing and sparse representation theories, each of which is capable of resolving some of the aforementioned issues. Firstly, we present a counting method based on image retrieval framework, and also introduce a compact global image descriptor using compressed sensing theory, to estimate the crowd count. Next, we propose a crowd counting method based on sparse representation-based classification and random projection. We adopt a semi-supervised elastic-net to provide a rich training set, that can span variations under testing conditions. By exploiting the sequential information of readily available vast quantity of unlabeled data, we are able to annotate a large portion of data with just a handful of labeled images. Experiments on crowd counting benchmark datasets demonstrate the effectiveness and reliability of proposed methods, especially in large-scale datasets. Image classification based on visual content is a challenging task, mainly because there is usually large amount of intra-class variability, arising from illumination and viewpoint variations, occlusion and corruption. In addition, many real-world vision applications are faced with the problem of high-dimensional data and small number of training samples. To address all these issues, we propose a joint learning framework, in which the subspace projection matrix, the dictionary and sparse coefficients are learned simultaneously. By incorporating competent constraints such as low-rank, incoherence and neighborhood preservation, we are able to learn discriminative and robust sparse representations of images, especially for challenging classification scenarios. Experimental results on several benchmark datasets verify the superior performance of our method for object classification of small datasets, which include considerable amount of different kinds of variation. Feature selection is another solution to deal with high-dimensional data, and recently sparsity constraints have been utilized to select a subset of features. We propose a feature selection method based on the decision rule of dictionary learning, and integrate low-rank matrix recovery, reconstruction residuals, and row-sparsity constraints into the framework. As a result, the proposed method selects optimal subset of features simultaneously, and provides well-separated classes in the reduced space. Our method is capable of selecting discriminative features, even when the data are contaminated due to occlusion, illumination or pose variations and corruption. Extensive experiments on benchmark datasets verify the superior performance of the proposed method for feature selection, image/video classification and counting specific populations of tumor cells in microscopic images.
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication
Foroughi, Homa, Nilanjan Ray, and Hong Zhang. "People counting with image retrieval using compressed sensing." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.Foroughi, Homa, Nilanjan Ray, and Hong Zhang. "Robust people counting using sparse representation and random projection." Pattern Recognition 48.10 (2015): 3038-3052.Foroughi, Homa, Moein Shakeri, Nilanjan Ray, and Hong Zhang. "Joint Feature Selection with Low-rank Dictionary Learning." In BMVC, pp. 97-1. 2015.Foroughi, Homa, Nilanjan Ray, and Hong Zhang. "Object Classification with Joint Projection and Low-rank Dictionary Learning." arXiv preprint arXiv:1612.01594 (2016).

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 17234565
Last modified: 2017:06:13 12:12:51-06:00
Filename: Foroughi_Homa_201703_PhD.pdf
Original checksum: 7db68ac7cdbca9eef46b2d68d1b20320
Well formed: true
Valid: true
Page count: 147
Activity of users you follow
User Activity Date