Download the full-sized PDF of Visual Similarity Analysis of Web Pages based on Gestalt TheoryDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Visual Similarity Analysis of Web Pages based on Gestalt Theory Open Access


Other title
Visual Incompatibility
Visual Similarity
Gestalt Law of Grouping
Extended Subtree
Type of item
Degree grantor
University of Alberta
Author or creator
Xu, Zhen
Supervisor and department
Miller, James (Electrical and Computer Engineering)
Examining committee member and department
Musilek, Petr (Electrical and Computer Engineering)
Miller, James (Electrical and Computer Engineering)
Mesbah, Ali (Electrical and Computer Engineering, UBC)
Dick, Scott (Electrical and Computer Engineering)
Wong, Ken (Computing Science)
Department of Electrical and Computer Engineering
Software Engineering & Intelligent Systems
Date accepted
Graduation date
2017-11:Fall 2017
Doctor of Philosophy
Degree level
With the rapid development of internet technology, web page has evolved from a traditional rich-text information source to a multi-functional tool, which can serve images, audios and videos, act as the GUI (Graphical User Interface) components of distributed applications, and so on. Similarity evaluation of the modern web pages becomes more essential yet difficult. On one hand, while many search engine rely on keyword search, texts play less important roles in web pages. On the other hand, there exists a variety of browsers and platforms that support HMTL/CSS/JavaScript in different levels, causing a web page is displayed differently among browsers. To address these issues, we propose four research topics. The first topic is to identify semantic blocks on web pages. We propose a model for merging web page content into semantic blocks based on human perception. To achieve this goal, we construct a layer tree to remove hierarchical inconsistencies between visual layout and DOM tree of web pages; we translate the Gestalt Laws of grouping to computer compatible rules can train a classifier to combine the laws to a unified rule to detect semantic blocks. The second topic is to estimate visual similarity of web pages. Existing approaches use DOM (Document Object Model) trees or images, but they either only focus on the structure of web pages or ignore inner connections among web page features. Therefore, we provide the block tree to combine both structural and visual information of web pages. Using this block tree structure, we propose a visual similarity measurement. The purpose of the third topic is to improve the visual similarity measurement and use it to detect visual differences in web pages when they are rendered in different browsers. The extended subtree model that maps sub trees instead of each single node is introduced for the precision improvement. The forth topic utilize the improved visual similarity measurement to create an automated testing framework for cross-browser visual incompatibility detection. An automated testing tool is also designed. Major contribution of this thesis is two-folds. On the one hand, it enriches theoretical analysis in the detection of semantic content, visual similarity, and cross-browser differences for web pages. On the other hand, it also provides an insight for testing cross-browser incompatibilities in practice.
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication
Xu, Zhen, and James Miller. "Identifying semantic blocks in Web pages using Gestalt laws of grouping." World Wide Web 19.5 (2016): 957-978.Xu, Zhen, and James Miller. “Estimating Similarity of Rich Internet Pages Using Visual Information”. Accepted by International Journal of Web Engineering and Technology on May 2017.

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 7746193
Last modified: 2017:11:08 17:21:35-07:00
Filename: Zhen_Xu_201707_PhD.pdf
Original checksum: 14062fcb6d1c4775fbdb6efaf0ca4b40
Activity of users you follow
User Activity Date