Cognitively-Active Speaker Normalization Based on Formant-Frequency Scaling Estimation

    Barreda-Castanon, Santiago
  • The acoustic characteristics associated with a vowel category may vary greatly when produced by different speakers. Despite this variation, human listeners are typically able to identify vowel sounds with a good degree of accuracy. One approach to this issue is that listeners interpret vowel sounds relative to what might be expected for a given speaker, a theory known as speaker normalization. This thesis comprises three experiments meant to test specific aspects of a theory of speaker normalization that is under active cognitive-control on the part of the listener, where the information used by the process is organized around the detection of speaker changes. The first experiment investigates the role of f0 in vowel perception, with results indicating that f0 primarily affects vowel quality by influencing the listener’s expectations regarding the speaker. In the second experiment, the interaction between the detection of speaker changes and the perception of vowel quality is investigated. Findings support the notion that the detection of speaker changes is a central component of speaker normalization, and that speaker normalization is a cognitively-active process. In the third experiment, listeners were trained to report the acoustic correlate associated with increases or decreases to the average formant frequencies produced by a voice (i.e., formant-frequency scaling). Results indicate that listeners are able to identify voices that differ on the basis of this parameter with good accuracy, and that the perceptual correlate of formant-frequency scaling is influenced by the fundamental frequency of vowel sounds. Finally, a model of cognitively-active speaker normalization, the Active Sliding Template Model (ASTM), is introduced. The ASTM predicts vowel quality on the basis of a speaker-specific representation that is refined in the absence of a detected speaker change, and re-estimated when a speaker change is detected. An implementation of this model was used to simulate the results of Experiments 1 and 2. The results of these simulations indicate that this relatively simple model of cognitively-active speaker normalization is able to generate a range of patterns of results similar to those observed for human listeners.

