Download the full-sized PDF
Permanent link (DOI): https://doi.org/10.7939/R3KH0F719
This file is in the following communities:
|Faculty of Graduate Studies and Research|
This file is in the following collections:
|Theses and Dissertations|
Modelling phonetic reduction in a corpus of spoken English using Random Forests and Mixed-Effects Regression Open Access
- Other title
- Type of item
- Degree grantor
University of Alberta
- Author or creator
Dilts, Philip C
- Supervisor and department
Baayen, R. Harald. (Linguistics)
Tucker, Benjamin V. (Linguistics)
- Examining committee member and department
Arppe, Antti (Linguistics)
Gahl, Susanne (Linguistics, University of California, Berkeley)
Kondrak, Grzegorz (Computing Science)
Department of Linguistics
- Date accepted
- Graduation date
Doctor of Philosophy
- Degree level
In this thesis, phonetic reduction in the Buckeye Corpus (Pitt et al. 2005) of conversational speech is modelled using advanced statistical techniques.
Two measures of phonetic reduction are modelled, reduction in the duration of words and deletion of segments from words. Statistical modelling techniques are used to predict how much of each type of reduction is observed in the corpus. Predictor variables are selected from a number of broad classes, including demographic, phonetic, predictability, syntactic, semantic, and pragmatic variables. The broad scope of these variables leads to a generalizable picture of the factors leading to reduction in spontaneous speech.
Two modelling techniques with complementary properties are applied to the modelling task: Random Forest (RF) models (Breiman 2001), and Linear Mixed-Effect Regression (LMER) Models. RF models can be used to model complex interactions and highly co-linear predictor variables much more easily than LMER models can. Conversely, LMER models allow each word form and speaker to differ in their response to reduction-predicting variables. LMER models can also easily incorporate predictor variables composed of a large number of unordered categories. Both of these properties of LMER models are effectively impossible to incorporate into current RF models on the scale required for the present study.
Results relating to the variables or combinations of variables that correlate with reduction or improve model prediction are described. Possible explanations for the results and implications for the nature of the processes underlying reduction during spontaneous speech are explored. Results relating to the modelling process are also discussed. In particular, random forest modelling indicated that several potential interactions between variables were overlooked in initial LMER modelling. When these interactions were included in a second round of LMER modelling, several were found to improve prediction significantly.
The results of the present study may lead to improvements in speech recognition and speech production technologies. The results also suggest that random forests can be used to improve regression models of language data.
- Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
- Citation for previous publication
- Date Uploaded
- Date Modified
- Audit Status
- Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 4598692
Last modified: 2015:10:12 15:12:49-06:00
Filename: Dilts_Philip_Fall 2013.pdf
Original checksum: 41a48080aee994bf65b8403fc704b303
Well formed: true
Page count: 203