Malagasy Speech Synthesis

Schnoor, Tyler T.

doi:doi:10.7939/r3-a336-ba51

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Linguistics, Department of / Honours Theses (Linguistics)

Usage

487 views
355 downloads

Malagasy Speech Synthesis

Author(s) / Creator(s)
- Schnoor, Tyler T.
Speech technologies may benefit people by improving the accessibility of information or services, by increasing productivity, or by generally improving human-computer interaction. However, speech technologies are only available for use in a small portion of the world’s languages. The present study aims to investigate some of the means by which contemporary machine learning approaches to speech synthesis may be adapted for use with under-resourced languages which do not have abundant data available. The first objective of the present study is to develop a Malagasy speech synthesis model which is effective enough to have practical implications. The second research objective is to explore whether the addition of crowd-sourced training data is beneficial to the model. I develop a web application which facilitates the remote collection of speech data and use it to collect a small, multi-speaker, Malagasy speech dataset for use in training. The merits of crowd-sourcing data from multiple speakers are compared to the merits of collecting data from a single speaker. The third and final objective is to explore the effects of cross-lingual transfer learning, data augmentation, and other methods which might facilitate the development of speech synthesis models for under-resourced languages. This is done by iteratively training models using these methods and comparing their outputs. The models are made using an open source implementation of the Tacotron framework (Tacotron 2 (without Wavenet), 2018/2022). The results of the present study suggest that a combination of cross-lingual transfer learning and data augmentation methods may be employed to train effective speech synthesis models using a small amount of speech data in an under-resourced language. The addition of multi-speaker data is not found to improve results when combined with a small single-speaker training set. Further investigation will determine whether multi-speaker data may be incorporated in other ways to enhance model outputs.
Date created

2022-01-01
Subjects / Keywords
Type of Item

Report
DOI

https://doi.org/10.7939/r3-a336-ba51
License

Attribution-NonCommercial 4.0 International

Language
- English
- Other language