Leveraging Translations for Lexical Semantics

  • Author / Creator
    Arnob Mallik
  • We leverage multilingual translations from parallel corpora to improve sense annotations, build end-to-end Word Sense Disambiguation pipelines and detect cross-lingual lexical entailment. Based on theories of translational equivalence, we propose novel algorithms capable of correcting noisy sense annotations on a parallel corpus. We show that, when applied to bilingual slices of a parallel corpus, these algorithms can rectify noisy sense annotations and thereby produce multilingual sense-annotated training data of improved quality. Furthermore, we propose novel end-to-end pipelines which can produce high-quality sense annotations from scratch in a fully unsupervised manner. Our methods achieve state-of-the-art results on standard WSD datasets for unsupervised approaches in several languages. Additionally, by exploring the generalization property of translations, we develop novel approaches to detect cross-lingual lexical entailment by leveraging word embeddings along with translations. We evaluate our methods on a standard shared task dataset and achieve encouraging results constituting a strong proof-of-concept. In summary, our results in three different tasks of lexical semantics confirm the utility of translations in this field.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.