Multi-modal piano note detection using audio and video

  • Author(s) / Creator(s)
  • Many people have been interested in music recognition. The automated transcription of musical compositions and the identification of sound sources, such as the sort of instruments used, have taken a lot of time and work. With the rise of personal computers and multimedia systems in recent years, research in these areas has gotten a lot of attention. In our paper, we have chosen a piano based song for the purpose of analysis. We have divided the song in chunks called frames for note recognition. Initially, we performed manual analysis to recognize the notes so that we have the correct notes. Then after, we have used finder tip following technique for tracking the notes which are played. This is our input dataset for image or frame based input. Subsequently, the audio is extracted and divided to chunks similar to number of frames in the video. We have performed audio frequency analysis to perform note detection based on the audio. When the variables of interest can’t be measured directly but an indirect measurement is available, Kalman filter and particle filter are used to estimate them as best as possible. They’re also used to obtain the best approximation of states in the presence of noise by integrating readings from numerous sensors. The novelty of our research is that we have implemented Kalman filter and particle filter based on audio and video based input instead of sensor data which is never used before.

  • Date created
    2022
  • Subjects / Keywords
  • Type of Item
    Research Material
  • DOI
    https://doi.org/10.7939/r3-g28j-f890
  • License
    Attribution-NonCommercial 4.0 International