ATEPP: A Dataset of Automatically Transcribed Expressive Piano Performance

Huan Zhang* Jingjing Tang* Syed RM Rafee* Simon Dixon George Fazekas
School of Electronic Engineering and Computer Science, Queen Mary University of London
The 23rd International Society for Music Information Retrieval Conference (ISMIR 2022)
Bengaluru, India

Paper Dataset Supplementary Files


NOTE: We released a cleaned version of our dataset. Please check the ATEPP-1.2 on github for more details. The following introduces the first release of the dataset.

Abstract

Computational models of expressive piano performance rely on attributes like tempo, timing, dynamics and pedalling. Despite some promising models for performance assessment and performance rendering, results are limited by the scale, breadth and uniformity of existing datasets. In this paper, we present ATEPP, a dataset that contains 1000 hours of performances of standard piano repertoire by 49 world-renowned pianists, organized and aligned by compositions and movements for comparative studies. Scores in MusicXML format are also available for around half of the tracks. We first evaluate and verify the use of transcribed MIDI for representing expressive performance with a listening evaluation that involves recent transcription models. Then, the process of sourcing and curating the dataset is outlined, including composition entity resolution and a pipeline for audio matching and solo filtering. Finally, we conduct baseline experiments for performer identification and performance rendering on our dataset, demonstrating its potential in generalizing expressive features of individual performing style.

Main Contributions:

Dataset Overview

The non-trivial task of Composition Entity Resolution (CER), involving the process of automatically aligning the complex naming schemes of classical music, is the major challenge of obtaining multiple performances of the same music at a larger scale. We developed a composition entity linker to address this challenge.



Data Curation Pipeline



Transcription Model

With the goal of reconciling the sustain effect from both pedal and keys, as well as achieving more accurate note offsets, we modify the original High-Resolution model with joint note-pedal training. As shown in Figure 1, what we model is the key action from the pianist instead of the string damping time of the note (that can either come from the sustain pedal or key action), which deviates from the traditional transcription task.


Figure 1: Output pianoroll comparison of the original High-Resolution model (top) and joint note-pedal version (bottom)


Statistics


Figure 2: Distribution of movements by number of performances. E.g. 12% of our data have more than 15 performances



Figure 3: Distribution of the top 25 pianists’ performances in the ATEPP dataset.


Citation


@inproceedings{zhang2022atepp,
  title={ATEPP: A Dataset of Automatically Transcribed Expressive Piano Performance},
  author={Zhang, Huan and Tang, Jingjing and Rafee, Syed Rifat Mahmud and Fazekas, Simon Dixon Gy{\"o}rgy},
  booktitle={ISMIR 2022 Hybrid Conference},
  year={2022}
}

* Indicates Equal Contribution