Towards Neural Information Extraction without Manual Annotated Data

Xu, Peng

doi:doi:10.7939/R3G44J63V

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

280 views
401 downloads

Towards Neural Information Extraction without Manual Annotated Data

Author / Creator

Xu, Peng
Information extraction (IE) is one of the most important technologies in the information age. Applying information extraction to text is linked to the prob- lem of text simplification in order to create a structured view of the informa- tion present in free text. However, information extraction is a very challenging task, due to the inherent difficulties to understand natural language and the high cost to obtain large manual annotated training data. In this thesis, we build on the premise of performing automatic information extraction without manual annotated data following the distant supervision paradigm and present novel neural models for different IE tasks which are particularly suited for this setting.
In the first part of the thesis, we focus on one IE task – fine-grained entity type classification (FETC) and propose the NFETC model – a single, much simpler and more elegant neural network model that attempts FETC “end- to-end” without post-processing or ad-hoc features. We study two kinds of noise, namely out-of-context noise and overly-specific noise, for noisy type labels and investigate their effects on FETC systems. We propose a neural network based model which jointly learns representations for entity mentions and their context. A variant of cross-entropy loss function is used to handle out-of-context noise. Hierarchical loss normalization is introduced into our model to alleviate the effect of overly-specific noise.
In the second part of the thesis, we focus on another IE task – relation
extraction (RE) and propose a neural model with multiple level of attention ii
mechanisms. The model can make full use of all informative words and sen- tences and alleviate the wrong labelling problem for distant supervised relation extraction.
In the third part of the thesis, we attempt to leverage knowledge base embedding methods to facilitate relation extraction and describe a novel neural framework Hrere to jointly learning heterogeneous representations from both text information and facts in an existing knowledge base. A novel loss function is introduced to connect the heterogeneous representations seamlessly allowing them to enhance each other.
Overall, the work in this thesis tackles different tasks of IE under the setting of distant supervision with DNNs, different attentions, different loss functions and the help of knowledge base embeddings. All the proposed models got state-of-the-art performance in representative tasks.
Subjects / Keywords
- Machine Learning
- Natural Language Processing
Graduation date

Fall 2018
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3G44J63V
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Denilson Barbosa