* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Contentretrieval
Survey
Document related concepts
Transcript
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University Overview Need effective ways to browse by content through audio databases of growing sizes Using descriptive sound parameters or query by example systems Determine similarity to query in order to rank search results by relevance (AudioGoogle) Feature selection is the sinews of war… Cheng Yang Approach (1) Audio files preprocessed to identify local peaks in signal power (n = 100-200/min) Spectrogram computed using STFT of 2048 samples with Hamming window of 1024 samples and overlap factor of 2 Spectral vector extracted around each peak makes up (n, 180, k<<2048) feature space (200-2000Hz range only) Yang Approach (2) Given an example query, compute the feature vector for the query and look for similar audio in database Compute minimum distance between query and database feature sets saving time using dynamic programming techniques (use results from previous pairs) Linearity filtering to favor timescaled version compared to error orientation disagreement Yang’s Results Use database of 120 song excerpts (~1min) Good performance with varying tempos, audio quality, performance variations Poor performance with transposed versions Slow response, improved with indexing schemes Jonathan Foote Approach Calculate feature vectors of audio examples of desired classes (12 MFCCs plus energy) Supervise training of quantized tree (partition feature space in maximally different class populations) Parameterized data is quantized using the tree for subsequent retrieval (creates template) To retrieve similar audio content, template is constructed for query audio, compared with corpus templates using cosine distance measure Foote’s Results Good way of measuring subjective qualities of sound, without using targeted features Not as accurate to other techniques using psycho-acoustic knowledge in finding similar timbres (e.g. instruments) Sensitive to pitch (will often return different timbres of same pitch) Erling Wold et al. Approach (1) Implemented several approaches in Muscle Fish software More particularly, specify explicit perceptual features (loudness, pitch, brightness, bandwidth, harmonicity) Statistics of corresponding acoustic correlates calculated for entire sample (mean, variance, autocorrelation) form a-vector For training set, mean vector calculated and covariance matrix built from the examples and becomes systems model Wold Approach (2) Use a weighted Euclidean distance for classification and similarity measurements Distance compared to threshold to decide if objects belong to the same class (optional) Wold Approach (3) Segmentation is required beforehand, achieved using same features, detecting strong discrepancies Wold and Foote comparison What I retain: Wold has proven that it is possible to use statistical methods for flexible classification