- 55 views
- 62 downloads
Deep Learning-Based Multi-Class Semantic Segmentation and Natural Language Scene Description of Multilane Rural Highways Using LiDAR Data
-
- Author / Creator
- Jiang, Honglin
-
The increasing adoption of light detection and ranging (LiDAR) technology offers a promising avenue for automating the identification of road features. However, due to the complexity and density of the point cloud, most research focuses on extracting single or binary road elements from the LiDAR data. Previous work that attempts to classify more than two categories often suffers from poor accuracy, whether using public datasets like SemanticKITTI and nuScenes or private datasets.
Concurrently, artificial intelligence and natural language processing have emerged as prominent research areas. This thesis demonstrates the effectiveness of leveraging machine learning and large language models to address challenges in the transportation field and propose novel frameworks for asset management, scene understanding, maintenance planning, traffic safety analysis, and intelligent transportation systems.
Recent advancements in Transformer architectures, known for their success in natural language processing, have shown impressive results in handling point cloud data. This thesis aims to identify and enhance the best Transformer-based model capable of simultaneously extracting multiple highway infrastructure elements, thus addressing the current gap in multi-object segmentation. The proposed methods would replace tedious and time-consuming manual processes with advanced deep-learning models that extract valuable features from high-density LiDAR point clouds.
The thesis presents two advanced semantic segmentation approaches that leverage transformer architectures and state-of-the-art natural language models for automating the extraction of rural multilane highway infrastructure elements and generating scene descriptions from LiDAR data. The first approach employs the Point Transformer v2 model, a Transformer-based architecture tailored for 3D point cloud processing, to process 50-meter highway segments as input along with four additional attributes. This approach leverages the self-attention mechanisms of the Transformer architecture to capture long-range dependencies and contextual information within the point cloud data. The second approach utilizes adaptations of self-attention and cross-attention mechanisms from the Transformer architecture, specifically designed for point cloud data, operating on individual LiDAR points for point-wise classification. This approach leverages the natural language models' ability to process sequential data and applies it to the spatial domain of point clouds, enabling efficient feature extraction and classification.
Experimental results conducted on 2.5 kilometres of highway segments in Alberta, Canada, demonstrate the effectiveness of the proposed approaches. The first approach achieved an overall Mean Intersection over Union (IoU) score of 78.29% and a Mean F1 score of 86.48%, with most individual class accuracies exceeding 95%. The second method achieved an overall Mean IoU score of 86.03% and a Mean F1 score of 92.21%.
This thesis addresses the critical gap between interpreting 3D point cloud data and generating natural language descriptions. The thesis proposes a novel approach that converts semantic segmentation output into multi-view images and integrates the advanced GPT-4o model under specific restrictive conditions. This integration aims to generate accurate and contextually rich textual representations of 3D highway scenes.
This research significantly advances automated infrastructure extraction techniques, providing transportation agencies with a more efficient way to inventory rural highway infrastructure elements. These advancements have direct implications for future autonomous driving, crash environment reproduction for improved highway safety scene understanding, big data analysis, maintenance planning, and asset management, making this study highly relevant and vital. -
- Subjects / Keywords
-
- Graduation date
- Fall 2024
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.