Leveraging Translations for Lexical Semantics

Arnob Mallik

doi:doi:10.7939/r3-v61k-9e05

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

138 views
154 downloads

Leveraging Translations for Lexical Semantics

Author / Creator

Arnob Mallik
We leverage multilingual translations from parallel corpora to improve sense annotations, build end-to-end Word Sense Disambiguation pipelines and detect cross-lingual lexical entailment. Based on theories of translational equivalence, we propose novel algorithms capable of correcting noisy sense annotations on a parallel corpus. We show that, when applied to bilingual slices of a parallel corpus, these algorithms can rectify noisy sense annotations and thereby produce multilingual sense-annotated training data of improved quality. Furthermore, we propose novel end-to-end pipelines which can produce high-quality sense annotations from scratch in a fully unsupervised manner. Our methods achieve state-of-the-art results on standard WSD datasets for unsupervised approaches in several languages. Additionally, by exploring the generalization property of translations, we develop novel approaches to detect cross-lingual lexical entailment by leveraging word embeddings along with translations. We evaluate our methods on a standard shared task dataset and achieve encouraging results constituting a strong proof-of-concept. In summary, our results in three different tasks of lexical semantics confirm the utility of translations in this field.
Subjects / Keywords
Graduation date

Fall 2021
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-v61k-9e05
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Kondrak, Greg (Computing Science)