Robust and Accurate Generic Visual Object Tracking Using Deep Neural Networks in Unconstrained Environments

  • Author / Creator
  • The availability of affordable cameras and video-sharing platforms have provided a massive amount of low-cost videos. Automatic tracking of objects of interest in these videos is the essential step for complex visual analyses. As a fundamental computer vision task, Visual Object Tracking aims at accurately (and efficiently) locating a target in an arbitrary video, given an initial bounding box in the first frame. While the state-of-the-art deep trackers provide promising results, they still suffer from performance degradation in challenging scenarios including small targets, occlusion, and viewpoint change. Also, estimating the axis-aligned bounding box enclosing the target cannot provide the full details about its boundaries. Moreover, the performance of tracker relies on its well-crafted modules, typically consisting of manually-designed network architectures to boost the performance. In this thesis, first, a context-aware IoU-guided tracker is proposed that exploits a multitask two-stream network and an offline reference proposal generation strategy to improve the accuracy for tracking class-agnostic small objects from aerial videos of medium to high altitudes. Then, a two-stage segmentation tracker to provide better semantically interpretation of target in videos is developed. Finally, a novel cell-level differentiable architecture search with early stopping is introduced into Siamese tracking framework to automate the network design of the tracking module, aiming to adapt backbone features to the objective of network. Extensive experimental evaluations on widely used generic and aerial visual tracking benchmarks demonstrate the effectiveness of the proposed methods.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.