AbstractNowadays, more people tend to use the Internet to search, listen, purchase, download and share music. Existing Music Information Retrieval (MIR) systems were either audio-based or symbolic-based. Audio-based MIR systems were based on various audio formats, for example, MP3. These formats can represent continuous sound waves well but are limited in illustrating the content flow of the music melody and generating large files for high fidelity music. In addition, audio-based MIR systems use the audio fingerprints to find the exact music pieces, but they have difficulty in finding variances. On the other hand, symbolic-based formats have advantages in effectively representing the content of the music with small files, and in facilitating music pattern identifications, but they are not suitable for Electronic Music (EM), which considers a continuous sound wave set. This is because the symbolic-based representions limit themselves to discrete set modelling so that they may introduce extra dummy notes to EM, which can affect the main melody flow and lead to the increase of errors in symbolic-based retrieving including identifying the origins from its variations. As a consequence of these, people have been getting unsatisfied results from both audio and symbolic based music search engines, as well as find it difficult to carry out music plagiarism checks. Therefore, we need to find a new way to describe, model and analyse music.
In this project, we aim to retain those advantages from both audio and symbolic sides while address their shortcomings, as briefly described above, by proposing a new architecture named E3MSD (Expressive, Efficient, and Extendable Music Similarity Detection). There are two contributions for E3MSD.
The first contribution is a new data model that describes the music information using both the Music Definition Language (MDL) and the Music Manipulation Language(MML), which can effectively and efficiently encode and represent the music. For evaluation, we have tested the MDL&MML from the perspectives of music storage and music representation. In terms of storage efficiency, the required storage of a sampled audio encoded by the proposed coding scheme is smaller than other popular audio-based forms. More precisely, a melody, with approximately 316 KB of the file size using the MP3 format, only requires 9 KBdisk space when using the MDL&MML format. In terms of music expressiveness, the proposed symbolic-based representation can model various timbre using less storage space without sacrificing the quality. Finally, E3MSD includes the automatic generation of MDL&MML file from the audio soundwaves. The derived MDL&MML file shares around 94% melodic and 100% rhythmic accuracy with manually generated one.
The second contribution is the development of a hybrid mechanism on the proposed musical data model, named MUsic Classification And Similarity Measurement (MUCASM), which combines contour, rhythm and audio fingerprints. This method features a modified reinforcement-based ensemble learning classification mechanism, which includes a decision tree that maps variations of music pieces to their corresponding originals, with variation types as their attributes, for example, rhythm variation. The experimental results show a stable accuracy of 84% without taking into account the types of variations, and 96% by using our proposed ensemble learning.
E3MSD can be extended to study its potentials in improving the performance of existing music search engines, building music version of plagiarism tools, and even generating remixes automatically based on the similarity scores.
|Date of Award||May 2020|
|Supervisor||Xiang Fei (Supervisor), Kuo-Ming Chao (Supervisor) & Ming Yang (Supervisor)|
Student thesis: Doctoral Thesis › Doctor of Philosophy