ERA

Download the full-sized PDF of Relation Extraction and its Application to Question AnsweringDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3QB9VJ17

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Relation Extraction and its Application to Question Answering Open Access

Descriptions

Other title
Subject/Keyword
implicit relation extraction
information extraction
question answering
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
Xu, Ying
Supervisor and department
Goebel, Randy ( Computing Science)
Examining committee member and department
Schuurmans, Dale (Computing Science)
Sun, Maosong ( Computer Science)
Barbosa, Denilson (Computing Science)
Ringlstetter, Christoph (Linguistic)
Kondrak, Grzegorz (Computing Science)
Department
Department of Computing Science
Specialization

Date accepted
2017-01-31T11:31:06Z
Graduation date
2017-06:Spring 2017
Degree
Doctor of Philosophy
Degree level
Doctoral
Abstract
Information extraction, extracting structured information from text, is a vital component for many natural language tasks such as question answering. It generally consists of two components: (1) named entity recognition (NER), identifying noun phrases that are names of organizations, persons, or countries; and (2) relation extraction, extracting relations between entities. In this dissertation, we assume the entities are given, and concentrate on the relation extraction task. Traditional relation extraction task seeks to confirm a predefined set of relations in a text, such as the employment or family relation. These systems are difficult to extend by including additional relations. In contrast, the open information extraction (Open IE) task attempts to extract all relations, using words in sentences to represent the relations. My dissertation focuses on Open IE. We first proposed a tree kernel based-Open IE system that achieved state of the art performance. One advantage of the tree kernel model is that it exploits information in syntactic parse trees without feature engineering. After observing the importance of words in relation extraction, we then incorporated word embeddings into the tree kernel and improved the system’s performance. However, previous systems have not considered implicit relations, i.e., relations implied in noun phrase structures such as Germany’s people, Google Images, and Shakespeare’s book. We call this type of structure nested named entities. To study the implicit relation phenomenon, we automatically extracted thousands of instances of training data from Wikipedia. We demonstrate the feasibility of recovering implicit relations with a supervised classification model. Our data and model provides a baseline for future work on this task. Last but not least, to show the effect of our relation extraction systems, we built an Open IE-based question answering system and achieved promising results. Our analysis indicates the weakness of the current Open IE systems, the role of our information extraction results, and gives directions for improvement.
Language
English
DOI
doi:10.7939/R3QB9VJ17
Rights
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication

File Details

Date Uploaded
Date Modified
2017-01-31T18:31:07.615+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 1297477
Last modified: 2017:06:13 12:29:36-06:00
Filename: thesis2_Achive.pdf
Original checksum: 795e43258e44f7860fac3770c99dd500
Well formed: false
Valid: false
Status message: Invalid page tree node offset=1002375
Status message: Unexpected error in findFonts java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary offset=3157
Page count: 72
Activity of users you follow
User Activity Date