Cognitively-Active Speaker Normalization Based on Formant-Frequency Scaling Estimation

Barreda-Castanon, Santiago

doi:doi:10.7939/R34Q7QZ5N

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

260 views
268 downloads

Cognitively-Active Speaker Normalization Based on Formant-Frequency Scaling Estimation

Author / Creator

Barreda-Castanon, Santiago
The acoustic characteristics associated with a vowel category may vary greatly when produced by different speakers. Despite this variation, human listeners are typically able to identify vowel sounds with a good degree of accuracy. One approach to this issue is that listeners interpret vowel sounds relative to what might be expected for a given speaker, a theory known as speaker normalization. This thesis comprises three experiments meant to test specific aspects of a theory of speaker normalization that is under active cognitive-control on the part of the listener, where the information used by the process is organized around the detection of speaker changes. The first experiment investigates the role of f0 in vowel perception, with results indicating that f0 primarily affects vowel quality by influencing the listener’s expectations regarding the speaker. In the second experiment, the interaction between the detection of speaker changes and the perception of vowel quality is investigated. Findings support the notion that the detection of speaker changes is a central component of speaker normalization, and that speaker normalization is a cognitively-active process. In the third experiment, listeners were trained to report the acoustic correlate associated with increases or decreases to the average formant frequencies produced by a voice (i.e., formant-frequency scaling). Results indicate that listeners are able to identify voices that differ on the basis of this parameter with good accuracy, and that the perceptual correlate of formant-frequency scaling is influenced by the fundamental frequency of vowel sounds. Finally, a model of cognitively-active speaker normalization, the Active Sliding Template Model (ASTM), is introduced. The ASTM predicts vowel quality on the basis of a speaker-specific representation that is refined in the absence of a detected speaker change, and re-estimated when a speaker change is detected. An implementation of this model was used to simulate the results of Experiments 1 and 2. The results of these simulations indicate that this relatively simple model of cognitively-active speaker normalization is able to generate a range of patterns of results similar to those observed for human listeners.
Subjects / Keywords
Graduation date

Fall 2013
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R34Q7QZ5N
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Linguistics
Supervisor / co-supervisor and their department(s)
- Tucker, Benjamin (Linguistics)
- Nearey, Terrance (Linguistics)
Examining committee members and their departments
- Hodge, Megan (Speech Pathology & Audiology)
- Nusbaum, Howard (Psychology, University of Chicago)
- Tessier, Anne-Michelle (Linguistics)