Application of Machine Learning to Automate Classification and Information Extraction in Industrial Construction Documents

  • Author / Creator
    Sajadfar, Narges
  • Industrial construction projects are usually mega-projects that involve millions of labour person-hours and generate hundreds of thousands of documents. Construction documents represent a vital source of information and knowledge regarding the project scope. Documents come in different types and include structured information such as data tables and unstructured information such as text, images, and drawings. The documents may consist of contract forms, drawings that define the quantities and qualities of materials, standards, and specifications required to carry out the project. Documents usually involve multi-versions and address different systems in a project, such as architectural, structural, electrical, and mechanical systems. The ability to extract and organize structured and unstructured information from these documents is a time-consuming process that is critical for effective project control and decision making. This task is more challenging and labour-intensive when documents are provided in image formats requiring human intervention to extract the required information. The objective of this research is to address this challenge by introducing an automated approach for managing and extracting information from construction documents. This research describes the development of automatic classification and information extraction based on both the text and images in industrial construction documents. The development of the proposed method includes the testing of various deep learning classification algorithms, to identify suitable models for construction documents.
    The results of the research confirmed the effectiveness of machine learning algorithms for classifying and extracting information from unstructured construction documents with limited text. This dissertation makes a major contribution by presenting a high-precision classification approach for construction documents that incorporates scanned images, with different sizes and resolutions. Furthermore, the method of automatic title block detection was demonstrated for unstructured construction documents in this research.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.