Multi-modal piano note detection using audio and video

Patel, Nirmalkumar Laxmanbhai

doi:doi:10.7939/r3-g28j-f890

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Concordia University of Edmonton / Master of Science in Information Technology Project Reports (Concordia University of Edmonton)

Usage

134 views
212 downloads

Multi-modal piano note detection using audio and video

Author(s) / Creator(s)
- Patel, Nirmalkumar Laxmanbhai
Many people have been interested in music recognition. The automated transcription of musical compositions and the identification of sound sources, such as the sort of instruments used, have taken a lot of time and work. With the rise of personal computers and multimedia systems in recent years, research in these areas has gotten a lot of attention. In our paper, we have chosen a piano based song for the purpose of analysis. We have divided the song in chunks called frames for note recognition. Initially, we performed manual analysis to recognize the notes so that we have the correct notes. Then after, we have used finder tip following technique for tracking the notes which are played. This is our input dataset for image or frame based input. Subsequently, the audio is extracted and divided to chunks similar to number of frames in the video. We have performed audio frequency analysis to perform note detection based on the audio. When the variables of interest can’t be measured directly but an indirect measurement is available, Kalman filter and particle filter are used to estimate them as best as possible. They’re also used to obtain the best approximation of states in the presence of noise by integrating readings from numerous sensors. The novelty of our research is that we have implemented Kalman filter and particle filter based on audio and video based input instead of sensor data which is never used before.
Date created

2022
Subjects / Keywords
Type of Item

Research Material
DOI

https://doi.org/10.7939/r3-g28j-f890
License

Attribution-NonCommercial 4.0 International

Language
- English