Download the full-sized PDF of Strategies for gazetteer improvement and enrichmentDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Strategies for gazetteer improvement and enrichment Open Access


Other title
Gazetteer enrichment
Bounding Box detection
Gazetteer refinement
Type of item
Degree grantor
University of Alberta
Author or creator
Singh, Sanket Kumar
Supervisor and department
Rafiei, Davood (Computing Science)
Examining committee member and department
Reformat, Marek (Electrical and Computer Engineering)
Sander, Joerg (Computing Science)
Rafiei, Davood (Computing Science)
Department of Computing Science

Date accepted
Graduation date
2017-11:Fall 2017
Master of Science
Degree level
Many applications that use geographical databases (a.k.a. gazetteers) rely on the accuracy of the information in the database. However, poor data quality is an issue in gazetteers; often data is integrated from multiple sources with different quality constraints and there may not be much detail on the sources and the quality of the data. One major consequence of this is that the geographical scope of a location and/or its position may not be known or accurate. In this thesis, we develop novel strategies to accurately derive the geographical scope of places. Our strategies use the spatial hierarchy of a gazetteer as well as other public information (such as area) to construct a bounding box for each place. We present a probabilistic model of our approach and demonstrate the effectiveness of the bounding boxes in refining the spatial hierarchy of a gazetteer and augmenting it with other public data. Experimental evaluation on two public-domain gazetteers show that the proposed approaches significantly outperform, in terms of the accuracy of the bounding boxes, a baseline that is based on the parent-child relationship of a gazetteer. More specifically, our approaches outperform the baseline by 19-33% in terms of accuracy in a wide range of settings. Among applications, we show how these bounding boxes provide a generic way to improve the accuracy and usability of a gazetteer.
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication
Sanket Kumar Singh and Davood Rafiei. Geotagging flickr photos and videos using language models. In MediaEval, 2016.

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 7453969
Last modified: 2017:11:08 16:45:35-07:00
Filename: Singh_Sanket Kumar_201708_MSc.pdf
Original checksum: 84978a4bb74b07cbefeef0c6a49457b5
Activity of users you follow
User Activity Date