Download LaiPiano-transcription

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Automated Transcription of Polyphonic
Piano Music
A Brief Literature Review
Catherine Lai
MUMT-611 MIR
February 17, 2005
1 /14
Outline of Presentation
 Introduction





– transcription of polyphonic music
– targeted on specific instruments
Current state-of-the-art: various approaches
Recent published piano transcription systems
– Dixon, 2000
– Raphael, 2002
– Monti and Sandler, 2002
– Marolt, 2004
Discussion and Conclusion
Links to examples of transcription of piano music recordings
Bibliography
2 /14
Introduction
 Transcription of polyphonic music
– acoustical waveform --> parametric representation
– extract pitches, starting times, durations
 First attempt by Moorer, 1975
– note range limitation
– two voices constraint
 Martin, 1996
– piano transcription system up to four voices
– chorale style of J.S. Bach (long durations with block chords)
 Future systems tackled limitations
– targeted system on specific instruments
 Focus of this literature review:
– automated transcription of polyphonic piano music
3 /14
Current State-of-the-Art: Various Approaches
 Automated transcription of polyphonic piano music




– input: audio files containing polyphonic piano music
– output: MIDI representing pitch, timing, volume
Simon Dixon, 2000. “On the Computer Recognition of Solo
Piano Music”
– standard SP, adaptive peak-picking, pattern matching
Christopher Raphael, 2002. “Automatic Transcription of Piano
Music”
– HMM
Monti and Sandler, 2002. “Automatic Polyphonic Piano Note
Extraction Using Fussy Logic in a Blackboard System”
– blackboard algorithm
Matija Marolt, 2004 “A connectionist approach to automatic
transcription of polyphonic music”
– neural network models
4 /14
Published Piano Transcription System
Simon Dixon, 2000. “On the computer recognition of solo piano
music” [standardized SP approach]
 1st processing stage
– low-filtering --> down-sampling signal (12kHz)
 Time-frequency representation
– STFT --> power spectrum --> spectral peak extraction (local
maxima > threshold, adaptive peak-picking algorithm)
– frequency tracks --> grouping partials --> musical notes
 Evaluation: 13 Mozart piano sonata performed by a concert pianist
– Bösendorfer SE290 computer-monitored piano --> MIDI
 Results: N=no. correctly i.d. notes; FP=no. note reported not played; FN=no. notes played not reported by
system ; incorrectly I.d. note = FP and FN
– score = N/(FP + FN + N)
– recognition accuracy of 70-80%
 Future development:
– accuracy of dynamic and offset times
5 /14
Published Piano Transcription System:
Christopher Raphael, 2002. “Automatic transcription of piano music”
[HMM]
 HMM- trained likelihood model




– statistical pattern recognition and machine learning for structures
Process
– segment signal to frames; extract features (vector) from frames;
assign label for content description
Precise vector features
– total energy (play or silent)
– local “burstiness” (attack, steady behavior)
– pitch configuration
Label
– sound pitches collection and re-articulation (attack, sustain, rest)
Model setup
– hidden process (label process); observable process (feature vector)
– generate reasonable hypotheses for each frame and construct
search graph of the hypotheses
6 /14
Published Piano Transcription System:
Christopher Raphael, 2002. “Automatic transcription of piano music”
[HMM]
 Experiment
– Mozart piano sonata
– limitations on range (c two octave below middle c to the f to two and
a half octave above middle c)
– number of voices 4 or less
 Evaluation
– borrowed from speech evaluation of “Word Error Recognition Rate”
– Error Rate = 100 * (Insertions + Deletions + Substitutions) / (Total
Words in Truth Sentence)
– preliminary results have a “Note Error Rate” of 39%
 184 substitutions, 241 deletions, 108 insertions out of 1360
notes
 Future improvement
– simple additions may yield better results
 likelihood of chord sequence
 informative note onsets acoustic cues
7 /14
Published Piano Transcription System:
Monti and Sandler 2002. “Automatic polyphonic piano note extraction
using fussy logic in a blackboard system” [Blackboard algorithm]
 Implementation
– Polyphonic Note Recognition using a Fuzzy Inference System
(FIS) as part of the Knowledge Sources (KSs) in a Blackboard
system
 Blackboard model arrangement
– hierarchy of data abstraction level
– KSs dictate advancement and is
activated by Scheduler
 FIS
– take spectral peaks not selected
– create new Note Candidates
– evaluate Candidate by features
 fundamental of note
Blackboard system
 harmonic rate
(Monti and Sandler, 2002)
 difference bt max peak in
spectrum and Candidate’s fundamental energy
8 /14
Published Piano Transcription System:
Monti and Sandler 2002. “Automatic polyphonic piano note extraction
using fussy logic in a blackboard system” [Blackboard algorithm]
 Evaluation
– 14 piano pieces by various composer including Beethoven,
Mozart, Debussy, Ravel, and Scarlatti
 Results
N=correctly i.d. notes; FP=note not played; FN=notes not reported by sys
– score = N/(FP + FN + N) Dixon’s
– detection success rate = 45% correct
– 75% = correctly detected note / total transcribed notes
9 /14
Published Piano Transcription System:
Matija Marolt, 2004 “A connectionist approach to automatic transcription
of polyphonic piano music.” [Neural networks approach]

New model based on networks of adaptive oscillators was proposed and
implemented in SONIC to partial tracking and note recognition
5.adaptive oscillators
try to synchronize to
signals in output freq
channels of the
auditory model by
adjusting its phase and
frequency
1. acoustical
waveform -->timefeq space with an
auditory model
2. auditory model
output set of freq
channel
6. When synchronized
to the output freq
indicate the freq is
periodic and a partial
with feq sim to filter
present
76 neural networks;
others tested
multilayer perception,
radial basis function,
etc.
3. periodicity in
frequency channels
is related to pitch
perception
4. use adaptive
oscillators to
calculate periodicity
in frequency
channels
Marolt, 2004
10 /14
Published Piano Transcription System:
Matija Marolt, 2004 “A connectionist approach to automatic transcription
of polyphonic piano music.” [Neural networks approach]



Evaluation
– tested on synthesized and real recordings of various genre
Results
– synthesized recoding around 90% of all notes
– real recording results not as good (not available)
– most common error (> 50%) octaves and rapidly played notes
(e.g.arpeggios, trills)
– greatest challenge very expressive playing
 Chopin’s Nocturnes
 quiet and almost inaudible left hand
Further Development
– detecting repeated notes
Marolt, 2004
11 /14
Discussion and Conclusion
 Various approaches proposed
– standard S.P. techniques; HMM; blackboard algorithm; neural
networks
 Common mistakes
– octave, rapid passages, and quiet notes
 Difficulties
– lack standard set of test examples
– evaluation function
 various constraints and formula -- > comparison difficult
Piano transcription system
Performance results
Dixon
70-80% correct
SONIC
80-95% correct
Raphael
39% wrong
Monti and Sandler
74% correct
12 /14
Links to examples of transcription of piano music recordings
 http://lgm.fri.uni-lj.si/~matic/SONIC.html (Marolt)
 http://www.ai.univie.ac.at/~simon/ (Dixon)
13 /14
Bibliography






Dixon, S. 2000. On the Computer Recognition of Solo Piano Music. Australasian
Computer Music Conference. 31-7.
Marolt, M. 2004. A connectionist approach to automatic transcription of
polyphonic piano music. IEEE Transactions on Multimedia 6, no. 3 (June): 43949.
Martin, K. 1996. A blackboard system for automatic transcription of simple
polyphonic music. MIT Media Laboratory Perceptual Computing Section
Technical Report No. 385.
Montipi, G, and M. Sandler. 2002. Automatic Polyphonic Piano Note Extraction
Using Fuzzy Logic in a Blackboard System. Proceedings of the International
Conference on Digital Audio Effects. 39-44.
Moorer, J. 1975. On the segmentation and analysis of continuous musical sound
by digital computer. Ph.D. thesis, Stanford University, CCRMA.
Raphael, C. 2002. Automatic Transcription of Piano Music. Proceedings of the
International Conference on Music Information Retrieval.
14 /14