ERA

Download the full-sized PDF of Extracting Information Networks from TextDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R37T09

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Extracting Information Networks from Text Open Access

Descriptions

Other title
Subject/Keyword
Named Entity Recognition
Natural Language Processing
Information Networks
Information Extraction
Relation Extraction
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
de Sa Mesquita, Filipe
Supervisor and department
Barbosa, Denilson (Computing Science)
Examining committee member and department
Carenini, Giuseppe (Computer Science, UBC)
Reformat, Marek (Electrical and Computer Engineering)
Goebel, Randolph (Computing Science)
Rafiei, Davood (Computing Science)
Department
Department of Computing Science
Specialization

Date accepted
2015-04-01T13:47:20Z
Graduation date
2015-06
Degree
Doctor of Philosophy
Degree level
Doctoral
Abstract
This work is concerned with the problem of extracting structured information networks from a text corpus. The nodes of the network are recognizable entities, typically people, locations, or organizations, while the edges denote relations among such entities. We use state-of-the-art natural language processing tools to identify the entities and focus on extracting instances of relations. The first relation extraction approaches were supervised and relation-specific, producing new instances of relations known a priori. While effective, this paradigm is not applicable in cases where the relations are not known a priori or when the number of relations is high. Recently, open relation extraction (ORE) techniques were developed to extract instances of arbitrary relations while requiring fewer training examples. Because of their appeal to applications that rely on large-scale relation extraction, a major requirement for ORE methods is low computational cost. Several ORE approaches have been proposed recently, covering a wide range of NLP machinery, from "shallow" (e.g., part-of-speech tagging) to "deep" (e.g., semantic role labeling -- SRL), thus raising the question of what is the trade-off between NLP depth (and associated computational cost) and effectiveness. We study this trade-off in depth, and make the following contributions. First, we introduce a fair and objective benchmark for this task, and report on an experimental comparison of 11 ORE methods shedding some light on the state-of-the-art. Next, we propose rule-based methods that achieve higher effectiveness at lower computational cost than the previous best approaches. Also, we address the problem of extracting nested relations (i.e., relations that accept relation instances as arguments) and n-ary relations (i.e., relations with n>2 arguments). Previously, all methods for extracting these types of relations were based on SRL, which can be up to 1000 times slower than methods based on shallow NLP. Finally, we describe an elegant solution that starts with shallow extraction methods and decides, on-the-fly and on a per-sentence basis, whether or not to deploy deeper extraction methods based on dependency parsing and SRL. Our solution prioritizes extra computational resources for sentences describing relation instances that are likely to be extracted by deeper methods. We show experimentally that this solution can achieve much higher effectiveness at a fraction of the cost of SRL.
Language
English
DOI
doi:10.7939/R37T09
Rights
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication
Yuval Merhav, Filipe Mesquita, Denilson Barbosa, Wai Gen Yee, and Ophir Frieder. 2012. Extracting Information Networks from the Blogosphere. ACM Trans. Web 6, 3 (2012), 11.Filipe Mesquita. 2012. Clustering Techniques for Open Relation Extraction. In Proceedings of the on SIGMOD/PODS 2012 PhD Symposium (PhD ’12). ACM Press, New York, NY, USA, 27–32. DOI: http://dx.doi.org/
10.1145/2213598.2213607Filipe Mesquita and Denilson Barbosa. 2011. Extracting Meta Statements from the Blogosphere. In Proceedings of the Fifth International Conference on Weblogs and Social Media (ICWSM ’11). AAAI Press, Menlo Park, CA, USA, 225–232.Filipe Mesquita, Yuval Merhav, and Denilson Barbosa. 2010. Extracting Information Networks from the Blogosphere: State-of-the-Art and Challenges. In Proceedings of the Fourth International Conference on Weblogs and Social Media, Data Challenge Workshop (ICWSM ’10). AAAI Press, Menlo Park, CA, USA, Article 3, 8 pages.Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 447–457.Filipe Mesquita, Ying Xu, Aditya Bhargava, Mirko Bronzi, Denilson Barbosa, and Grzegorz Kondrak. 2011. The Effectiveness of Traditional and Open Relation Extraction for the Slot Filling Task at TAC 2011. In Proceedings of the Fourth Text Analysis Conference. NIST, Gaithersburg, MD, USA, Article 66, 7 pages.

File Details

Date Uploaded
Date Modified
2015-06-15T07:10:20.018+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 2682023
Last modified: 2015:10:21 22:27:27-06:00
Filename: de_Sa_Mesquita_Filipe_201503_PhD.pdf
Original checksum: 51bba55efd53b1ea647168cecff1b3df
Activity of users you follow
User Activity Date