Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Rethinking Algorithm Design and Development in Speech Processing T. Stadelmann, Y.Wang, M. Smith, R. Ewerth, and B. Freisleben Universities of Marburg and Hannover, Germany Problem statement Eidetic Design What to do if algorithms do not behave as expected? •Reimplementation does not reach published results •Adaptation to new data & problem does not work •Implementation does not show what theory suggests •Other disciplines naturally gain intuition via visualization •But visualization is not enough – it is just one possible transformation to the data in order to perceive meaning due to natural human abilities How to select competing techniques and parameters? •Effect of particular choice on hole process unclear •Effect of specific parameter combination unknown How to arrive at a promising hypothesis? •Conceptualize a method like “know your data” from data mining – for speech processing •Create methodology for making failure s in complex speech processing algorithms graspable by humans Use intuition – but how? Instead: recast algorithmic sub-results… •…to the specific perceptual domain in which humans are experts in intuitively grasping the context, the character and the reasons of the issue at hand •I.e., visualization, audibilization, “perceptualization”, … Implement a culture of perceptually motivated speech research [Hill, 2007] •Motivate the use of intuition beyond visualization •Facilitate its use by conceptualizing a workflow •Enable the use intuition by providing free tools 1. existing algorithm/process 2. unexpected outcome => question/problem step 1 step 2 … methodology step n data 1 result prerequisites Proposed workflow data data 2 … =? data n 3. generate data from intermediate results 4. find suitable domain 5. use transformation tool & intuition suitable domain = Case study Initial question: why does MFCC+GMM not work reliably for speaker clustering whereas it does for speaker identification? •Algorithm: MFCC extraction and GMM building algorithm •Problem: techniques seem not expressive enough for the more difficult task => where is the bottleneck? •Data: MFCC matrix, GMM parameter vectors •Suitable domain: features and models originate from auditory domain => resynthesize to domain of auditory perception to hear if they include what makes up a voice Available tools •WebVoice: resynthesize speechand speaker features and models •PlotGMM: plot Gaussian mixture models •Visit http://www.informatik.uni- marburg.de/~stadelmann/eidetic.html Result: found bottleneck in missing time coherence information in GMM, improved DER by 56% in experiment w/ prototyp [Stadelmann et al. 2009] ICPR‘2010 - 20th International Conference on Pattern Recognition, 23.-26. August, Istanbul, Turkey