Geotagging Named Entities in Web Pages

  • Author / Creator
    Yu, Jiangwei
  • We study the problem of geotagging named entities where the goal is to identify
    the most relevant location of a named entity based on the content of the Web pages
    where the entity is mentioned. We hypothesize the relationship between the mentions
    of an entity and its geo-center in web pages, and propose a framework that
    explores this hypothesis and provides a model that can give a ranked list of locations
    at different location granularities for an entity. We further study the problem
    of dispersion, and show that the dispersion of a name can be estimated and a geo-center
    can be detected at an exact dispersion level.
    Two key features of our approach are: (i) minimal assumption is made on the
    structure of the mentions hence the approach can be applied to a diverse and heterogeneous
    set of web pages, and (ii) the approach is unsupervised, leveraging shallow
    English linguistic features and large gazetteers.
    We evaluate our methods under different settings and with different categories
    of named entities. Our evaluation reveals that the geo-center of a name can be
    estimated with a good accuracy based on some simple statistics of the mentions,
    and that the accuracy of the estimation varies with the categories of the names.

  • Subjects / Keywords
  • Graduation date
    Fall 2014
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.