A Framework for Synthesis of Musical Training Examples for Polyphonic Instrument Recognition

Sethi, Rameel

doi:doi:10.7939/R3M32NS1W

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

322 views
548 downloads

A Framework for Synthesis of Musical Training Examples for Polyphonic Instrument Recognition

Author / Creator

Sethi, Rameel
Music information retrieval (MIR), an interdisciplinary field involving the classifying or detection of structure in music, is essential for processing, indexing, querying and making recommendations from the vast amount of musical data available on the web and in audio library collections. Deep neural networks have yielded state-of-the-art results in several MIR tasks, but are often limited by their reliance on the availability of large amounts of annotated training data. Thus, applying deep learning in MIR may prove difficult when applied to music databases with limited amounts of labelling. This thesis addresses the question of whether algorithmically compositions generated from a specification of instruments and note events may serve as a viable alternative to real labeled music recordings for use as training data in MIR classification tasks. We propose a simple framework for generation of synthetic musical compositions for use as training data using the popular Musical Instrument Digital Interface (MIDI) protocol, which may be rendered as audio using commonly available synthesizers. In addition, we apply a variety of audio transformations to the generated audio samples for data augmentation purposes. We apply this music synthesis algorithm to the MIR tasks of polyphony estimation (number of instruments sounding) and instrument recognition (which instruments are playing) in polyphonic tracks where multiple instruments may sound simultaneously in each analysis frame, and evaluate our framework on publicly available annotated music datasets. We empirically demonstrate that pure synthesis of a musical training set without usage of a training set of music yields statistically significant improvements over a random or majority classifier. The main contribution of this thesis is to show that synthetic musical composition generation coupled with data augmentation has the potential to aid content-based MIR in music collections with limited amounts of annotation.
Subjects / Keywords
- Deep learning
- Music information retrieval
Graduation date

Fall 2018
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R3M32NS1W
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Specialization
- Statistical Machine Learning
Supervisor / co-supervisor and their department(s)
- Hindle, Abram (Computing Science)
- Bulitko, Vadim (Computing Science)