ERA

Download the full-sized PDF of Identifying Cognate Sets Across Dictionaries of Related LanguagesDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R3NV99Q98

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Graduate Studies and Research, Faculty of

Collections

This file is in the following collections:

Theses and Dissertations

Identifying Cognate Sets Across Dictionaries of Related Languages Open Access

Descriptions

Other title
Subject/Keyword
cognates
machine learning
natural language processing
computational linguistics
computational diachronic linguistics
Type of item
Thesis
Degree grantor
University of Alberta
Author or creator
St Arnaud, Adam, J.J.
Supervisor and department
Kondrak, Grzegorz (Computing Science)
Examining committee member and department
Amaral, J. Nelson (Computing Science)
Beck, David (Linguistics)
Department
Department of Computing Science
Specialization

Date accepted
2017-04-26T11:14:28Z
Graduation date
2017-11:Fall 2017
Degree
Master of Science
Degree level
Master's
Abstract
Cognates are words in related languages that have originated from the same word in an ancestor language, such as the English/German word pair father/Vater. Cognate information is critical in the field of historical linguistics, where it is used to determine the relationships between languages and to construct the ancestor languages they originated from. Most recent work in cognate identification focuses on the task of clustering cognates within lists of words each having an identical definition. In that task, only orthographic or phonetic information about a word is utilized when making cognate judgments. We present a system for the more challenging task of identifying cognate sets across dictionaries of related languages. The likelihood of a cognate relationship is calculated on the basis of a rich set of features that capture both phonetic and semantic similarity, as well as the presence of regular sound correspondences. The pairwise similarity scores are combined with an average-score clustering algorithm to create sets of words from different languages that may originate from a common proto-word. When tested on the Algonquian language family, our system detects 63% of cognate sets while maintaining cluster purity of 70%.
Language
English
DOI
doi:10.7939/R3NV99Q98
Rights
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication

File Details

Date Uploaded
Date Modified
2017-04-26T17:14:29.039+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 556241
Last modified: 2017:11:08 17:24:26-07:00
Filename: StArnaud_Adam_JJ_201704_MSc.pdf
Original checksum: 59e2e26ae7c4df903a87fadd1367cda1
Activity of users you follow
User Activity Date