Download the full-sized PDF of Web-assisted anaphora resolutionDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Web-assisted anaphora resolution Open Access


Other title
anaphora resolution
Type of item
Degree grantor
University of Alberta
Author or creator
Li, Yifan
Supervisor and department
Musilek, Petr (Electrical and Computer Engineering)
Reformat, Marek (Electrical and Computer Engineering)
Examining committee member and department
Pedrycz, Witold (Electrical and Computer Engineering)
Fair, Ivan (Electrical and Computer Engineering)
Zadrozny, Slawomir (Polish Academy of Sciences)
Sutton, Richard (Computing Science)
Electrical and Computer Engineering

Date accepted
Graduation date
Doctor of Philosophy
Degree level
This dissertation investigates the utility of the web for anaphora resolution. Aside from offering a highly accurate, web-based method for pleonastic it detection, which eliminates up to 4% of errors in pronominal anaphora resolution, it also introduces a web-assisted model for definite description anaphoricity determination and a prototype system of anaphora resolution that uses the web for virtually all subtasks. The thesis starts with a thorough analysis of the relationship between anaphora and definiteness, a study that bridges the gap between previously reported empirical studies of definite description anaphora and the linguistic theories developed around the concept of definiteness. Various naturally-occurring definite descriptions found in the WSJ corpus are analyzed from both perspectives of familiarity and uniqueness, and a new classification scheme for definite descriptions is developed. With the fundamental issues solved, the rest of the thesis focuses on the various ways the web can be exploited for the purpose of anaphora resolution. This thesis presents methods of high-precision, high-recall anaphoricity determination for both pronouns and definite descriptions. Evaluation results suggest that the performance of the pleonastic it identification module is on par with casually-trained human annotators. When used together with a pronominal anaphora resolution system, the module offers a statistically significant performance gain of 4%. The performance of the anaphoricity determination module for definite descriptions, which benefits from both the insight gained from the study on anaphora and definiteness and the significantly expanded coverage offered by the web, is also one of the highest among existing studies. The thesis also introduces a web-centric anaphora resolution system. Aside from serving as the information source for implementing selectional restrictions and discovering hyponym/synonym relationships, the web is additionally used for gender/number determination and many other auxiliary tasks, such as determining the semantic subjects of as-prepositions, identifying antecedents for certain empty categories, and assigning appropriate labels for proper names using information available from the text itself. With a design that specifically leaves room for the application of verb-argument and genitive co-occurrence statistics, the web-based features provide statistically significant gains to the system's performance.
License granted by Yifan Li ( on 2010-01-11T20:56:41Z (GMT): Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of the above terms. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 2551171
Last modified: 2015:10:12 15:12:49-06:00
Filename: Li_Yifan_Spring_2010.pdf
Original checksum: d96e7a485f5622d83998c753b71d8adc
Well formed: false
Valid: false
Status message: Lexical error offset=2445630
Page count: 201
Activity of users you follow
User Activity Date