Usage
  • 97 views
  • 362 downloads

Syntactic Features and Text Types in 20th Century Plains Cree: A Constraint Grammar Approach

  • Author / Creator
    Schmirler, Katherine M
  • This dissertation describes the creation of a morphosyntactically tagged corpus of Plains Cree (nêhiyawêwin), an Indigenous language of North America, and demonstrates three ways in which this corpus can be used to explore morphosyntactic variation in the language on a larger scale than previously feasible. The corpus includes ~152,000 words of Plains Cree drawn from several published volumes of transcribed oral text, collected in the 1920s and 1980s-1990s, offering a variety of text types and time periods to consider. These texts are divided by time period into two subcorpora, the Bloomfield subcorpus and the Ahenakew-Wolfart subcorpus.
    The tagged corpus is created using two main tools: 1) a Finite State Transducer-based morphological model for Plains Cree, and 2) a Constraint Grammar-based syntactic parser. Though the initial development of the former predates the dissertation, manual validation of the morphological analyses produced by the model undertaken as part of this work has contributed to its ongoing development. The latter is a core component of the present work, and aims to disambiguate ambiguous wordforms and assign basic syntactic functions. Chapter 2 describes the morphosyntactic features needed to build the syntactic parser, as well as some that are not currently implemented but will contribute to an improved model in the future. Chapter 3 describes the creation of the syntactic parser using the Constraint Grammar formalism, including improvements from an earlier iteration. Chapter 4 evaluates the effectiveness of the syntactic parser and describes the corpus in more detail, including the texts it contains and the morphosyntactic feature tags assigned by the models. Chapter 5 focuses on the argument tags, exploring the variation of where and when arguments occur in a language with flexible word order. Even with only a morphosyntactically tagged corpus, the pragmatic influences on argument realisation and word order can be observed. In both Chapters 4 and 5, variation between the subcorpora is explored as well, demonstrating the ways in which the subcorpora differ, and how these differences are obscured when the corpus is examined as a whole—different verb classes, noun classes, persons, word order patterns, etc. Chapter 6 then offers an example of how variation between the subcorpora and the different text types they contain can be explored, using Principal Component Analysis (PCA) to undertake a text type analysis. Differences between the subcorpora are apparent, though similarities in narratives are also demonstrated; the primary contrasts are found between narratives and speeches and, within narratives, between dialogue and non-dialogue narrative.
    Among the first large corpora for under-resourced Indigenous languages, this automatically tagged corpus allows for the exploration of oral language use on a large scale. The corpus serves as the basis of a searchable online corpus for academics and community members, as a tool for research and as a supplement to language education.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-pz87-ye25
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.