A Novel Framework for Unique People Count from Monocular Videos

  • Author / Creator
    Mukherjee, Satarupa
  • Counting unique number of people in a video (i.e., counting a person only once while the person passes through the field of view (FOV)), is required in many video analytic applications, such as transit passenger and pedestrian volume count in railway stations, malls and road intersections, aid in security and resource management, urban planning, advertising and many others. In this PhD thesis I have developed a robust algorithm to generate unique people count from monocular videos taken from an arbitrary angle. From applications point of view, my algorithm is one of the most economical ones, because it can work with existing video cameras already mounted. Within a region of interest (ROI) on the FOV of the camera, I compute influx/outflux rate of people, i.e., number of people coming in or going out of the ROI per unit time. Then, I sum the influx/outflux rate between any two time points to estimate the number of people that entered and/or left the ROI within that time interval. I employ two well-known computer vision techniques for this purpose: Gaussian process regression (GPR) to estimate the number of people present within a ROI and optical flow-based tracking of the boundary of the ROI. The principle roadblock in most of computer vision problems is occlusion. To avoid this bottleneck, we adopt the combination of (a) the concept of influx and outflux of fluid mass from computational fluidics, (b) the GPR to estimate the number of people within a ROI and (c) ROI boundary tracking (as opposed to object or feature tracking) for a short period. Thus, the principal contribution of the thesis is to successfully handle occlusions by computing the average influx and/or outflux of people and avoiding people detection and tracking. We validate the proposed algorithm on 19 publicly available monocular benchmark videos. Occlusions are abundant in these videos, yet we obtain more than 95% accuracy for most of these videos. We also extend our proposed framework beyond monocular videos and apply it on multiple views of a publicly available dataset with about 99% accuracy.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Ray, Nilanjan (Conputing Sciences)
  • Examining committee members and their departments
    • Cheng, Irene (Computing Science)
    • Mandal, Mrinal (Electrical & Computer Engineering)
    • Boulanger, Pierre (Computing Science)
    • Saha, Punam (Electrical & Computer Engineering, The University of Iowa, USA)
    • Zhang, Hong (Computing Science)