APhL Aligner: A Neural Network Forced-Alignment System

Matthew C. Kelley; Scott James Perry; Benjamin V. Tucker

doi:doi:10.7939/r3-kb5b-nn12

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Linguistics, Department of / Presentations (Linguistics)

Usage

252 views
305 downloads

APhL Aligner: A Neural Network Forced-Alignment System

Author(s) / Creator(s)
Forced alignment is increasingly used in phonetics to automatically produce boundaries between words and phones. These boundaries can have significant errors and are often only placed at some predetermined time interval, like every 10 ms. We discuss some potential remedies to these difficulties and test them in a new neural network-based forced alignment system called the APhL Aligner, trained on the TIMIT and Buckeye speech corpora. In part, errors incurred during forced alignment can be attributed to the acoustic models that attempt to separate phones from each other. Even state-of-the-art neural network models struggle to acoustically separate phones. We examine the effect of relaxing the requirement to separate phones by instead training separate detectors for each phone class. Resolving the 10 ms interval difficulty requires a different approach. As with most aligners, we perform a Viterbi-style alignment to align windows of audio spaced at 10 ms to the phone string given by a pronunciation dictionary. We add an additional step, however, and use linear interpolation to determine an intermediate point after the 10 ms interval to place the boundary. We compare the results of these manipulations to the results of the Montreal Forced Aligner, custom-trained on the same data.
Date created

2021-12-03
Subjects / Keywords
Type of Item

Conference/Workshop Presentation
DOI

https://doi.org/10.7939/r3-kb5b-nn12
License

Attribution 4.0 International

Language
- English