Identifying Cognate Sets Across Dictionaries of Related Languages Open Access
- Other title
natural language processing
computational diachronic linguistics
- Type of item
- Degree grantor
University of Alberta
- Author or creator
St Arnaud, Adam, J.J.
- Supervisor and department
Kondrak, Grzegorz (Computing Science)
- Examining committee member and department
Amaral, J. Nelson (Computing Science)
Beck, David (Linguistics)
Department of Computing Science
- Date accepted
- Graduation date
Master of Science
- Degree level
Cognates are words in related languages that have originated from the same word in an ancestor language, such as the English/German word pair father/Vater. Cognate information is critical in the field of historical linguistics, where it is used to determine the relationships between languages and to construct the ancestor languages they originated from. Most recent work in cognate identification focuses on the task of clustering cognates within lists of words each having an identical definition. In that task, only orthographic or phonetic information about a word is utilized when making cognate judgments. We present a system for the more challenging task of identifying cognate sets across dictionaries of related languages. The likelihood of a cognate relationship is calculated on the basis of a rich set of features that capture both phonetic and semantic similarity, as well as the presence of regular sound correspondences. The pairwise similarity scores are combined with an average-score clustering algorithm to create sets of words from different languages that may originate from a common proto-word. When tested on the Algonquian language family, our system detects 63% of cognate sets while maintaining cluster purity of 70%.
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
- Citation for previous publication
- Date Uploaded
- Date Modified
- Audit Status
- Audits have not yet been run on this file.
File format: pdf (PDF/A)
Mime type: application/pdf
File size: 556241
Last modified: 2017:11:08 17:24:26-07:00
Original checksum: 59e2e26ae7c4df903a87fadd1367cda1
Activity of users you follow