Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multimodal Information Analysis for Emotion Recognition (Tele Health Care Application) Malika Meghjani Gregory Dudek and Frank P. Ferrie Content 1. Our Goal 2. Motivation 3. Proposed Approach 4. Results 5. Conclusion Our Goal • Automatic emotion recognition using audio-visual information analysis. • Create video summaries by automatically labeling the emotions in a video sequence. Motivation • Map Emotional States of the Patient to Nursing Interventions. • Evaluate the role of Nursing Interventions for improvement in patient’s health. NURSING INTERVENTIONS Proposed Approach Visual Feature Extraction Visual based Emotion Classification Decision Level Fusion Audio Feature Extraction Audio based Emotion Classification Data Fusion Recognized Emotional State Visual Analysis 1. Face Detection (Voila Jones Face Detector using Haar Wavelets and Boosting Algorithm) 2. Feature Extraction (Gabor Filter: 5 Spatial Frequencies and 4 orientations, 20 filter responses for each frame) 3. Feature Selection (Select the most discriminative features in all the emotional classes) 4. SVM Classification (Classification with probability estimates) Visual Feature Extraction X = ( 5 Frequencies X 4 Orientation ) Frequency Domain Filters Feature Selection Automatic Emotion Classification Audio Analysis 1. Audio Pre-Processing (Remove leading and trailing edges) 2. Feature Extraction (Statistics of Pitch and Intensity contours and Mel Frequency Cepstral Coefficients) 3. Feature Normalization (Remove inter speaker variability) 4. SVM Classification (Classification with probability estimates) Audio Feature Extraction Audio Feature Extraction Audio Signal Automatic Emotion Classification Speech Rate Pitch Intensity Spectrum Analysis Mel Frequency Cepstral Coefficients (Short-term Power Spectrum of Sound) SVM Classification Feature Selection • • • Feature selection method is similar to the SVM classification. (Wrapper Method) Generates a separating plane by minimizing the weighted sum of distances of misclassified data points to two parallel planes. Suppress as many components of the normal to the separating plane which provide consistent results for classification. Average Error Distance Count of Features Selected Bounding Planes Data Fusion 1. Decision Level: • Obtaining probability estimate for each emotional class using SVM margins. • The probability estimates from two modalities are multiplied and renormalized to the give final estimation of decision level emotional classification. 2. Feature Level: Concatenate the Audio and Visual feature and repeat feature selection and SVM classification process. Database and Training Database: 1. Visual only posed database 2. Audio Visual posed database Training: • Audio segmentation based on minimum window required for feature extraction. • Corresponding visual key frame extraction in the segmented window. • Image based training and audio segment based training. Experimental Results Statistics No. of Training Examples No. of Subjects No. of Emotional State Posed Visual Data Only (CKDB) 120 20 5+Nuetral 75% Leave one subject out cross validation Posed Audio Visual Data (EDB) 270 9 6 82% Decision Level 76% Feature Level Database % Validation Recognition Method Rate Time Series Plot Surprise Sad Angry Disgust Happy 75% Leave One Subject Out Cross Validation Results ( * Cohen Kanade Database, Posed Visual only Database) Feature Level Fusion (*eNTERFACE 2005 , Posed Audio Visual Database) Decision Level Fusion (*eNTERFACE 2005 , Posed Audio Visual Database) Confusion Matrix Demo Conclusion • Combining two modalities (Audio and Visual) improves overall recognition rates by 11% with Decision Level Fusion and by 6% with Feature Level Fusion • Emotions where vision wins: Disgust, Happy and Surprise. • Emotions where audio wins: Anger and Sadness • Fear was equally well recognized by the two modalities. • Automated multimodal emotion recognition is clearly effective. Things to do… • Inference based on temporal relation between instantaneous classifications. • Tests on natural audio-visual database (on-going).