Leveraging supplementary transcriptions and transliterations via re-ranking

  • Author / Creator
    Bhargava, Aditya
  • Grapheme-to-phoneme conversion (G2P) and machine transliteration are important tasks in natural language processing. Supplemental data can often help resolve difficult ambiguities: existing transliterations of the same word can help choose among a G2P system’s candidate output transcriptions; similarly, transliterations from other languages can help choose among candidate transliterations in a given language. Transcriptions can be leveraged in this way as well. In this thesis, I investigate the problem of applying supplemental data to improve G2P and machine transliteration results. I present a unified method for leveraging related transliteration or transcription data to improve the performance of a base G2P or machine transliteration system. My approach constructs features with the supplemental data, which are then used in an SVM re-ranker. This re-ranking approach is shown to work across multiple base systems and achieves error reductions ranging from 8% to 43% over state-of-the-art base systems in cases where supplemental data are available.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.