Visual Similarity Analysis of Web Pages based on Gestalt Theory

  • Author / Creator
    Xu, Zhen
  • With the rapid development of internet technology, web page has evolved from a traditional rich-text information source to a multi-functional tool, which can serve images, audios and videos, act as the GUI (Graphical User Interface) components of distributed applications, and so on. Similarity evaluation of the modern web pages becomes more essential yet difficult. On one hand, while many search engine rely on keyword search, texts play less important roles in web pages. On the other hand, there exists a variety of browsers and platforms that support HMTL/CSS/JavaScript in different levels, causing a web page is displayed differently among browsers. To address these issues, we propose four research topics. The first topic is to identify semantic blocks on web pages. We propose a model for merging web page content into semantic blocks based on human perception. To achieve this goal, we construct a layer tree to remove hierarchical inconsistencies between visual layout and DOM tree of web pages; we translate the Gestalt Laws of grouping to computer compatible rules can train a classifier to combine the laws to a unified rule to detect semantic blocks. The second topic is to estimate visual similarity of web pages. Existing approaches use DOM (Document Object Model) trees or images, but they either only focus on the structure of web pages or ignore inner connections among web page features. Therefore, we provide the block tree to combine both structural and visual information of web pages. Using this block tree structure, we propose a visual similarity measurement. The purpose of the third topic is to improve the visual similarity measurement and use it to detect visual differences in web pages when they are rendered in different browsers. The extended subtree model that maps sub trees instead of each single node is introduced for the precision improvement. The forth topic utilize the improved visual similarity measurement to create an automated testing framework for cross-browser visual incompatibility detection. An automated testing tool is also designed. Major contribution of this thesis is two-folds. On the one hand, it enriches theoretical analysis in the detection of semantic content, visual similarity, and cross-browser differences for web pages. On the other hand, it also provides an insight for testing cross-browser incompatibilities in practice.

  • Subjects / Keywords
  • Graduation date
    Fall 2017
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
  • Specialization
    • Software Engineering & Intelligent Systems
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Mesbah, Ali (Electrical and Computer Engineering, UBC)
    • Dick, Scott (Electrical and Computer Engineering)
    • Miller, James (Electrical and Computer Engineering)
    • Wong, Ken (Computing Science)
    • Musilek, Petr (Electrical and Computer Engineering)