Usage
  • 375 views
  • 333 downloads

Sparse and Dense Visual SLAM with Single-Image Depth Prediction

  • Author / Creator
    Loo, Shing Yan
  • In this thesis, we investigate the use of single-image depth prediction from convolutional neural networks (CNNs) in sparse and dense monocular visual simultaneous localization and mapping (SLAM) problems. Mainly, we are interested in solving three problems: (1) data association, (2) dense mapping, and (3) long-term adaptation.
    Hence, we divide the thesis into three parts to discuss the contributions to solving the problems mentioned above.

    To improve the robustness of data association in visual SLAM, our first proposal extends the state-of-the-art semi-direct visual SLAM algorithm using single-image depth prediction to improve the reliability of feature matching. We propose to use the additional depth information to initialize new features with a small uncertainty centred at the predicted depth. By reducing depth uncertainty, feature correspondence can be identified in a reduced search range along the epipolar line, resulting in fast convergence of the feature depth and improved mapping performance. With the improved mapping performance, our method outperforms the state-of-the-art visual SLAM algorithms in camera tracking error.

    To recover a dense structure, we densify the semi-dense structure of the scene recovered from the state-of-the-art direct SLAM algorithm, LSD-SLAM. To this end, our second proposal exploits the local depth gradient consistency from single-image relative depth prediction as a spatial regularizer to densify the semi-dense depth maps. In addition, we propose an adaptive filtering scheme that incorporates the depth and pixel intensity of a local window to reduce the noise of the semi-dense structure, which allows for a substantial gain in densification accuracy. The optimized semi-dense and densified structures, in turn, are being used to refine the pose-graph to refine the pose estimation. Experimental results show that our dense reconstruction accuracy outperforms the state-of-the-art methods by a large margin.

    Nevertheless, single-image depth prediction from CNNs tends to give accurate depth estimations on images similar to that of the training images. Therefore, to improve the generality of single-image depth prediction used in visual SLAM, our third proposal introduces a long-term adaptation framework, which supports online fine-tuning of a depth prediction CNN to improve its accuracy while leveraging improved quality of depth prediction to optimize the structure and camera pose estimation globally. Particularly, we propose a novel online adaptation method in which the fine-tuning is enhanced with regularization to retain the previously learned knowledge while the CNN is continually trained. We demonstrate the use of fine-tuned depth prediction for map point culling before running global photometric BA, resulting in a more accurate map reconstruction than running global photometric BA on all map points.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-tp4b-ke15
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.