Download the full-sized PDF of Deep Synthetic Viewpoint PredictionDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Deep Synthetic Viewpoint Prediction Open Access


Other title
Convolutional Neural Networks
Computer Vision
Viewpoint Prediction
Type of item
Degree grantor
University of Alberta
Author or creator
Hess, Andy T
Supervisor and department
Nilanjan Ray (Computing Science)
Hong Zhang (Computing Science)
Examining committee member and department
Hong Zhang (Computing Science)
Nilanjan Ray (Computing Science)
Pierre Boulanger (Computing Science)
Department of Computing Science

Date accepted
Graduation date
Master of Science
Degree level
Determining the viewpoint (pose) of rigid objects in images is a classic vision problem with applications to robotic grasping, autonomous navigation, augmented reality, semantic SLAM and scene understanding in general. While most existing work is characterized by phrases such as "coarse pose estimation", alluding to their low accuracy and reliance on discrete classification approaches, modern applications increasingly demand full 3D continuous viewpoint at much higher accuracy and at real-time speeds. To this end, we here decouple localization and viewpoint prediction, often considered jointly, and focus on answering the question: How accurately can we predict, at real-time speeds, full 3D continuous viewpoint for rigid objects given that objects have been localized? Using vehicles as a case study, we train our model using only black and white, synthetic renders of fifteen cars and demonstrate its ability to accurately generalize the concept of "vehicle viewpoint" to color, real-world images of not just cars, but vehicles in general even in the presence of clutter and occlusion. We report detailed results on numerous datasets, some of which we have painstakingly annotated and one of which is new, in the hope of providing the community with new baselines for continuous 3D viewpoint prediction. We show that deep representations (from convolutional networks) can bridge the large divide between purely synthetic training data and real-world test data to achieve near state-of-the-art results in viewpoint prediction but at real-time speeds.
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 2660646
Last modified: 2016:06:24 17:07:01-06:00
Filename: Hess_Andy_T_201509_MSc.pdf
Original checksum: 66986ab6f1d617c4341702eeb65d238d
Activity of users you follow
User Activity Date