Deep Learning-based Framework of Summarizing Construction Videos for Vision-based Monitoring of Construction Sites

  • Author / Creator
    Xiao, Bo
  • In recent years, video monitoring of construction sites has become increasing popular worldwide, with the video footage captured containing important visual information concerning the progress of the given project. Video monitoring also improves the security at construction sites, serving as a deterrent against theft of materials and equipment. Furthermore, vision-based analysis of video footage is beneficial to construction management in terms of facilitating crew productivity, reducing safety risks, and optimizing site layouts. Despite offering a range of potential benefits, though, the efficient use of raw jobsite videos by construction professionals remains a challenge. In current practice, construction engineers have to manually browse the entire video to retrieve the desired information from a particular period of footage, and this manual inspection is a time-consuming and error-prone process. Meanwhile, storage of the video footage is challenging, especially considering the high resolution and long streaming time typical of construction site footage. Consequently, project managers have to recycle video footage every one or two weeks to free up digital storage space, discarding construction documentation that would have been invaluable as a long-term point of reference. To address these issues, this research proposes a deep learning-based framework to automatically distill raw video footage from construction sites into video highlights and text descriptions using a deep learning-based framework. To achieve this overarching goal, three specific objectives are pursued: (1) dataset development: developing an image dataset of construction machine images for deep learning object detection; (2) highlights detection: proposing a deep learning-based method for detecting video highlights from construction raw video footage; and (3) text generation: deploying deep learning image captioning methods to generate text descriptions from construction images. The outputs of the proposed framework (i.e., video highlights and text descriptions) will help construction engineers to efficiently ascertain what is happening in construction site without the need to manually browse the original construction videos. Compared with the original raw footage, the video highlights and text descriptions require much less storage space, making it practical to retain them for a period of years rather than weeks. The proposed framework provides the foundation for several advanced applications that will benefit the construction management, including: (1) auto-generating reports from daily construction videos; (2) building a querying system that searches for clips of interest based on text descriptions; and (3) quantitatively analyzing construction productivity based on video highlights. The framework proposed in this research is focusing on summarizing videos of construction machines captured by stationary cameras, which can be expanded for processing other types of construction videos (e.g., workers and materials) in the future.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.