Usage
  • 39 views
  • 68 downloads

Human Pose Estimation and Shape Modeling in 3D: New Cameras, Datasets and Approaches

  • Author / Creator
    Zou, Shihao
  • Human pose estimation and shape modeling serve as critical elements in a wide range of computer vision applications. While most existing research employs RGB cameras for their accessibility and cost-effectiveness, emerging camera technologies and imaging modalities are relatively underexplored. These novel technologies often introduce unique features that can provide new avenues for advancement in the fields of human pose estimation and shape modeling. Therefore, this thesis aims to investigate human pose estimation and shape modeling from and particularly beyond RGB cameras by exploring the potential opportunities presented by emerging camera technologies. Our research is organized into three key areas: the exploration of new cameras, the development of novel approaches, and the creation of large-scale multi-modality datasets for human pose estimation and shape modeling.

    1) Our research in 3D skeletal pose estimation, tracking, and motion forecasting for multi-person scenarios using RGB cameras addresses complexities like intra-frame occlusions. We propose a unified spatiotemporal transformer with spatiotemporal deformable attention to simultaneously execute these tasks in one computational pass. 2) We further explore event cameras, innovative sensors that balance high temporal resolution with low energy consumption, for energy-efficient parametric shape estimation and tracking. Our approach includes a two-stage deep learning method that primarily uses event data, initially requiring only the first gray-scale frame, and later, an end-to-end approach using Spiking Neural Networks (SNNs) for efficient pose tracking from events alone. 3) Utilizing polarization cameras, which capture robust geometric surface cues, we propose a framework for reconstructing detailed, clothed human shapes, beyond skeletal poses or basic parametric shapes. 4) Finally, we turn our focus to the complex task of animating clothed humans with natural clothing deformations, leveraging point-cloud sequences captured by depth sensors that provide valuable geometric insights into the structure of the clothing. We introduce a diffusion-based method for clothed human modeling that integrates dynamics, progressive, and diversified modeling, addressing gaps in current data-driven approaches.

    To overcome the limitations of existing datasets primarily based on RGB cameras, we developed a cost-effective motion capture system that synchronizes multi-modality cameras and a pipeline for annotating 3D parametric pose and shape. This led to the creation of several large-scale datasets for human pose estimation and shape modeling: 1) PHSPD with 527K frames featuring polarization and multi-view RGB-Depth images, 2) MMHPSD with 240K frames containing event streams and RGB-Depth images, and 3) SynEventHPD, a synthesized event-based dataset. Together, PHSPD, MMHPSD, and SynEventHPD form the most extensive and varied 3D human motion capture datasets available, with their multi-modality property holding significant potential for driving existing and new research directions in the computer vision community.

    In summary, this thesis demonstrates that emerging camera technologies such as polarization cameras, event cameras, and point-clouds provide new perspectives and effective solutions for related tasks in the fields of human pose estimation and shape modeling. Extensive experiments across various projects further validates the effectiveness of the novel approaches we propose for these tasks.

  • Subjects / Keywords
  • Graduation date
    Spring 2024
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-pyrm-sg51
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.