ERA

Download the full-sized PDF of Annotating Web Tables Using Surface Text PatternsDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3VT1H13C

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Annotating Web Tables Using Surface Text Patterns Open Access

Descriptions

Other title
Subject/Keyword
text processing
database
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
Wang, Andong
Supervisor and department
Rafiei, Davood (Computing Science)
Examining committee member and department
Rafiei, Davood (Computing Science)
Barbosa, Denilson (Computing Science)
Goebel, Randy (Computing Science)
Department
Department of Computing Science
Specialization

Date accepted
2016-01-19T08:46:48Z
Graduation date
2016-06
Degree
Master of Science
Degree level
Master's
Abstract
While the World Wide Web has always been treated as an immense source of data, most information it provides is usually deemed unstructured and sometimes ambiguous, which in turn makes it unreliable. But the web also contains a relatively large number of structured data in the form of tables, which are constructed elaborately by human. Unfortunately, each relational table on the Web carries its own "schema''. The semantics of the columns and the relationships between the columns are often ill-defined; this makes any machine interpretation of the schema difficult and even sometimes impossible. We study the problem of annotating Web tables where given a table and a set of relevant documents, each describing or mentioning the element(s) of a row, the goal is to find surface text patterns that best describe the contexts for each column or combinations of the columns. The problem is challenging because of the number of potential patterns, the amount of noise in texts and the numerous ways rows can be mentioned. We develop a 2-stage framework where candidate patterns are generated based on sliding windows over texts in the first stage, and in the second stage, patterns are generalized and the redundant patterns are removed. Experiments are conducted to evaluate the quality of the annotations in comparison to human annotations.
Language
English
DOI
doi:10.7939/R3VT1H13C
Rights
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication

File Details

Date Uploaded
Date Modified
2016-01-19T15:46:59.667+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 1116637
Last modified: 2016:06:16 16:59:52-06:00
Filename: Wang_Andong_201601_MSc.pdf
Original checksum: 1317b118610e67175e753144088d44ae
Well formed: true
Valid: true
Page count: 65
Activity of users you follow
User Activity Date