Download TABLE OF CONTENTS

Biomedical engineering 1 ________________________________________________________________________ 1. INTRODUCTION Schizophrenia is a chronic, severe, and disabling brain disease. Approximately 1% of the population develops schizophrenia during their lifetime. While schizophrenia is found worldwide, more than 2 million Americans suffer from the illness in a given year. The severity of the symptoms and long-lasting, chronic pattern of schizophrenia often cause a high degree of disability. Although schizophrenia affects men and women with equal frequency, the disorder often appears earlier in men, usually in the late teens or early twenties, than in women, who are generally affected in the twenties to early thirties. People with schizophrenia often suffer a range of symptoms. Their speech and behavior can become disorganized that they may be incomprehensible to others. Available treatments can relieve many symptoms, but most people with schizophrenia continue to suffer some symptoms throughout their lives; it has been estimated that no more than one in five individuals recovers completely. Research is gradually leading to new and safer medications through better understanding of the complex causes of the disease. Approaches include the study of molecular genetics while new imaging methods of the brain’s structure and function hold the promise of new insights into the disorder. The first signs of schizophrenia often appear as confusing changes in behavior. The sudden onset of severe psychotic symptoms is referred to as an “acute” phase of schizophrenia. Less obvious symptoms, such as social isolation or withdrawal, or unusual speech, thinking, or behavior, may precede, be seen along with, or follow the more obvious psychotic symptoms. CURRENT METHOD OF DIAGNOSIS In 2004 the European Brain Council estimated that in Ireland the cost of schizophrenia was €37million of a total brain disorders cost of €260million. Currently t here are no objective biomarkers for the diagnosis of schizophrenia. Mental health professionals currently use the presence of specific symptom clusters during clinical interviews as a means of making a diagnosis of psychiatric disorders. People with schizophrenia often show “blunted” or “flat” affect. This refers to a severe reduction in emotional expressiveness. A person with schizophrenia may not show the signs of normal emotion, perhaps may speak in a monotonous voice, have diminished facial expressions, and appear extremely apathetic. ________________________________________________________________________ Biomedical engineering 2 ________________________________________________________________________ The person may withdraw socially, avoiding contact with others; and when forced to interact, he or she may have nothing to say, reflecting “impoverished thought.” Motivation can be greatly decreased, as can interest in or enjoyment of life. These problems with emotional expression and motivation are symptoms of schizophrenia. ________________________________________________________________________ Biomedical engineering 3 ________________________________________________________________________ 2. AIM The aim is to develop a new approach to diagnose and to disease manage serious mental illnesses (SMI) such as schizophrenia using the acoustic analysis of speech. The spontaneous speech of schizophrenia patients with negative symptoms (e.g. flat affect, loss of interest) contains many hesitations and pauses. Acoustic analysis demonstrates that the length of pauses, independent of other linguistic or paralinguistic parameters, is strongly associated with the clinician's opinion of the patient's negative symptoms. [1, 2] Employing automatic acoustic analysis will result in the development of specific and sensitive acoustic diagnostic biomarkers for schizophrenia. The ability to provide acoustic diagnosis would see significant advances in the following areas:  Advanced Medical Diagnostics. An automation of patient diagnosis would reduce with the need to wait for appointments with either clinic or hospital staff. With the availability of acoustic signal processing on easily acquired speech signals, high quality assessments can be carried out remotely, providing diagnostic prescreening of patients with a variety of psychiatric disorders. This will result in prioritizing those in need of the most immediate attention resulting in the use of available medical resources in an efficient and effective manner.  Advanced Personalized Care through Disease Management – Comparison of a client’s physiological data automatically generated from biomedical signal processing with local and international databases will increase the disease management capacities of primary care physicians and specialists. Such information processing will enhance treatment adherence and allow continuous monitoring of a variety of health variables remotely. Anticipated benefits of this acoustic diagnosis approach include:  Provide critical diagnostic information objectively, efficiently, cost-effectively and non-invasively  Reduce the time spent and the need for medical personnel to make diagnoses  Manage mental health professional’s time efficiently ________________________________________________________________________ Biomedical engineering 4 ________________________________________________________________________  Reduce travel times and costs for patients  Provide medical specialists with patients specific audio signal interpretations  Empower patients to become proactive participants in their health management. In this thesis, Chapter 3 overviews the literature in voice characteristics and their important in psychiatric diagnosis. Chapter 4 will give a brief look in prior acoustic investigations on schizophrenia. In Chapter 5 the audio database is presented and the methods used in the development of the classifier discussed. The classifier performance is presented in Chapter 6. Following in Chapter 7 is a discussion of the findings. Chapter 8 draws conclusion from this study. ________________________________________________________________________ Biomedical engineering 5 ________________________________________________________________________ 3. SPEAKING BEHAVIOR AND VOICE SOUND CHARACTERISTICS OF SCHIZOPHRENICS PATIENTS Symptoms such as affective flattening, blunted affect, the inability to experience or express a normal range of affective responses, emotional dullness, pervasive apathy, poverty of speech and psychomotor retardation have been constituents of clinical definitions of schizophrenia throughout this century and have led to the negative-positive model of schizophrenia. In this model, delusions, hallucinations and florid formal thought disorders are conceptualized as pathological excesses, whereas negative symptoms are conceptualized as pathological deficits. Based on this heuristic division, negative symptoms have been theorized to be unresponsive to antipsychotic medication, to be associated with a deteriorating course, and to reflect some structural brain abnormalities that would mediate the hypothesized association between negative symptoms and poor prognosis. Recent findings, however, have indicated that positive symptoms are not uniformly responsive to treatment, and that negative symptoms are not universally immutable. Specifically, during recovery from an acute episode of schizophrenia, the key symptoms tend to disappear first while symptoms at lower levels persist longer. Non-specific and affective symptoms tend to recur as prodromal symptoms of relapse. Furthermore, several authors have reported a diagnostic non-specificity of negative symptoms, particularly with regard to various clinical forms of depression which occur relatively commonly in schizophrenic patients and there is also some overlap between the specific phenomena of depressive illness and negative symptoms in schizophrenia. [1] 3.1. Formal thought disorder In the psychiatric literature, many of the abnormalities of language in schizophrenia are lumped together as formal thought disorder (a disorder in the form of thought, not the content). ________________________________________________________________________ Biomedical engineering 6 ________________________________________________________________________ Perhaps most commonly it is the moment-to-moment, logical sequencing of ideas which is at fault. At other times, the mechanisms of language production themselves appear to be disturbed, so that the meaning of individual words and phrases is obscured. At still other times, the fault seems to be at the level of discourse: individual words, sentences, and sequences of thought make sense, but there is no discernible thread to longer verbal productions. Florid formal thought disorder is a relatively uncommon finding in acute schizophrenia, though it is somewhat more common in chronic cases. Manifestations of formal thought disorder include poverty of content (failure to express sufficient information), loss of goal (slippage away from the intended topic), clanging (chaining together similar sounding words as if distracted by them), and other kinds of incoherence and unintelligibility. 3.2. First study of linguistic structure The study of schizophrenic language disorder by linguists began with Chaika (1974), who studied a single patient who spoke normally for weeks at a time, her deviant language coinciding with what her psychiatrists term psychotic episodes. Stripped of some mid-1970s theoretical terminology, and condensed somewhat, the abnormalities that Chaika observed were: (1) Failure to utter the intended lexical item; (2) Distraction by the sounds or sense of words, so that a discourse becomes a string of word associations rather than a presentation of previously intended information; (3) Breakdown of syntax and/or discourse; (4) Lack of awareness that the utterances are abnormal. Of these, (2) is most characteristic of schizophrenia; (1) and (3) resemble ordinary speech errors, and (4) resembles some forms of aphasia. ________________________________________________________________________ Biomedical engineering 7 ________________________________________________________________________ 3.3. Schizophrenic language vs. normal speech errors Fromkin responded that except for the disruption of discourse which can be attributed to non-linguistic factors, all the features [of schizophrenic language] are prevalent in normal speech as exemplified by speech errors and “slips of the tongue” Mistaken lexical choices and minor scramblings of syntax are common in everyday speech. Indeed, speech errors are often triggered by the sounds or senses of recently uttered words, and speakers are commonly unaware of their fumbles. Thus, (1), (2), (3), and (4) of Chaika’s core abnormalities are all disposed of, except for derailment of discourse, which Fromkin considers extra linguistic. This claim has not held up. Although there are obvious similarities between Chaika’s samples of schizophrenic language and Fromkin’s corpus of speech errors, there are also obvious differences. Normal speakers make occasional errors like those seen in schizophrenia, but not whole strings of errors. A representative patch of gibberish from Chaika’s patient comprised 9 syllables, and uncorrected speech errors of such length and unintelligibility do not occur in normal speech. What’s more, normal speakers, when an error is pointed out, immediately correct it; speakers with schizophrenia do not. Moreover, Chaika’s patient would commonly string together 10 or 20 sentences connected, as far as one can see, only by word associations; ordinary people, even when plagued by speech errors, do not do this. Normal speech errors are momentary deviations from a discourse plan that is immediately resumed. 3.4. Loss of voluntary control In later work, Chaika argues that schizophrenic language disorder is fundamentally a loss of voluntary control over the speech generation process. Indeed, according to Chapman, patients sometimes say in retrospect that this is exactly what happened—they couldn’t control their speech. This echoes the main theme of the Schneiderian first-rank ________________________________________________________________________ Biomedical engineering 8 ________________________________________________________________________ symptoms, which is loss of control over the train of thought. Note however that Chaika’s original patient apparently lacked such insight. Chaika argues that loss of voluntary control ties together a wide range of observed phenomena, depending on which part of language production goes out of control—most often discourse organization, but often lexical retrieval, and sometimes pronunciation or syntax. It fits well into a more general conception of schizophrenia as degradation of communication between mental subsystems. 3.5. Schizophrenic language disorders vs. aphasia How much do the language disturbances of schizophrenia resemble the aphasia caused by stroke, traumatic brain injury, or neurological conditions such as epilepsy? Researchers agree that there are important differences, but beyond that, discussion of the issue has been complicated by the heterogeneity of both schizophrenia and aphasia. Aphasia-like symptoms are episodically observed in only a small proportion of subjects considered to be schizophrenics whereas the aphasia produced by stroke or brain injury is in most cases constantly present. Patients with aphasia have normal thoughts and express them with difficulty; those with schizophrenia have unusual thoughts (or disorganized discourse plans) and express them with comparative ease. Pinard and Lecours compare schizophrenic language to Wernicke’s aphasia (including jargon aphasia), a disorder in which the patient speaks fluently but unintelligibly. Their main findings: (1) Schizophasic discourse often has a preferred theme or preoccupation; aphasic discourse rarely does. (2) Speakers with schizophrenia often jump from one subject to another based on the sounds or associations of words they have uttered (association chaining or glossomania). This is seldom observed in jargon aphasia; it requires lexical mastery well beyond that of most aphasics, as well as remarkable control of prosody. ________________________________________________________________________ Biomedical engineering 9 ________________________________________________________________________ (3) Schizophasic discourse often includes rare words, evidence of a large, intact vocabulary; jargon aphasia, even when very fluent, shows a restricted vocabulary. (4) Schizophrenic speech can include conscious creation of new words (neologisms) and consciously constrained discourse in which the speaker is well aware that the speech is unusual, whether or not others can understand it. Aphasic speakers who produce fluent unintelligible discourse do not seem to be fully aware of what they are doing, and if they create new words, it is as if by accident 3.6. Andreasen’s 18-point scale The standard account of schizophrenic language today is that of Andreasen, whose Thought, Language, and Communication (TLC) scale provided a foundation for subsequent research and clinical practice. The scale comprises 18 symptoms: (1) poverty of speech (2) poverty of content (wordy vagueness) (3) pressure of speech excessive speed or emphasis) (4) distractibility (by stimuli in the environment) (5) tangentiality (partly irrelevant replies) (6) loss of goal (7) derailment (loss of goal in gradual steps) (8) circumstantiality (numerous digressions on the way to the goal) (9) illogicality (10) incoherence (“word salad”, severely disrupted structure) (11) neologisms (novel made-up words) (12) word approximations (coined substitutes for existing words) (13) stilted speech (pompous or overly formal style) (14) clanging (15) perseveration (16) echolalia ________________________________________________________________________ Biomedical engineering 10 ________________________________________________________________________ (17) blocking (sudden stoppage) (18) self-reference (talking about oneself excessively) 3.7. Liddle’s TLI Liddle et al. simplified Andreasen’s TLC index into a Thought and Language Index comprising 8 symptoms, which factor analysis divided into 3 groups with strong, cleanly separate factor loadings. They found that the disorganization and impoverishment symptom groups were approximately orthogonal rather than bipolar (neither one is the opposite of the other; they can coexist in any combination). Perseveration and distractibility correlated with each other but were independent of both impoverishment and disorganization. Another noteworthy result was that the scores in the healthy control group were not negligible, i.e., the abnormalities were found, in mild form, in non-patients. The triad of Liddle et al. resembles other classifications of symptoms of schizophrenia going back all the way to the 19th-century division into catatonia (negative symptoms), paranoia (odd thoughts), and hebephrenia (disorganization). 3.8. Chen’s CLANG scale In a paper that deserves to be better known, Chen et al. present an alternative to Andreasen’s TLC scale and its derivatives. Their CLANG (Clinical Language) scale comprises 17 symptoms classified according to levels of linguistic structure. Ranging beyond Andreasen’s and Liddle’s scales, CLANG includes disturbances of fluency, voice quality, and articulation. It is a fuller evaluation of speech, not just “thought” or discourse. ________________________________________________________________________ Biomedical engineering 11 ________________________________________________________________________ The factor analysis by Chen et al. found three major kinds of language dysfunction in schizophrenic patients, “syntactic,” “semantic,” and “production.” “Syntactic” dysfunction affects the structure of language on all levels, including lexical access. “Semantic” dysfunction affects the ability to map thoughts onto language and pursue a communicative goal; it corresponds closely to the traditional definition of thought disorder. “Production” dysfunction comprises poverty of speech, lack of details, and lack of intonation; it is associated with the negative symptoms of schizophrenia. There are also factors for “pressure” and “prosody” (each cleanly self-contained, even though “prosody” is only one symptom) and two symptoms, dysarthria and excessive details, that are unclassified because they have minor loadings in several factors. Ceccherini-Nelli and Crow got a different factor analysis in which the “syntactic” and “semantic” factors fell together into one, perhaps because of different scoring criteria. Their second factor, “poverty,” corresponds closely to the “production” of Chen et al. Their third factor, “excess,” comprises syntactic perseveration (excess syntactic constraints) and excessive detail (interpretable as semantic perseveration). [3] 3.9. Acoustic Speech Characteristics Predominantly negative-symptom schizophrenia (NSZ) can be characterized by a number of atypical reductions in observed behavior, including communication behavior. Classically observed outward manifestations in spoken (e.g., linguistic production) and pragmatic/paralinguistic (e.g., body language, intonation, and prosody) aspects of communication are related to affective flattening, anhedonia, alogia, and avolition (the Diagnostic and Statistical Manual of Mental Disorders summarized characterization of NSZ), implying an effect on both affect and cognition. Atypical communication can have an effect on clinical assessment, as it may influence rater scoring on a number of items from subjective behavioral rating scales (e.g., PANSS) including items such as blunted ________________________________________________________________________ Biomedical engineering 12 ________________________________________________________________________ affect, emotional withdrawal, poor rapport, social withdrawal, reduced flow of conversation, and motor retardation. Although these rating scales are invaluable in their ability to help clinicians assess symptomatology, the addition of objective and quantifiable measures of disease severity and therapeutic treatment response are desirable and possible through speech and voice acoustic measurement. In the most basic terms, physical quantitative measurements of communication behavior, using aspects of frequency, intensity, and time, support and extend the clinical impressions of atypical communication used clinically. This adds clinical value by providing repeatable quantification of observed symptomatology. The literature surrounding acoustic investigations in persons with schizophrenia has revealed a number of consistent themes. Flat affect, alogia, and asociality (as measured by the PANSS scale) are strongly related to restricted speech output, monotonous speech, pause in speech, energy variation, utterance duration, and inflection. In addition, acoustic measures have shown great promise in identifying treatment response by demonstrating a larger treatment effect than those seen with traditional rating scales. It has also been demonstrated that specific measures of acoustic inflection are sensitive enough to differentiate between antipsychotic compounds (Olanzapine vs. Haloperidol; Remoxipride vs. Haloperidol), whereas rating scales were not able to detect this difference. Bidirectional changes in the speech acoustic characteristics between drug conditions have also led researchers to the conclusion that different mechanisms of drug action that may be at work, though rating scales were not able to make this distinction. Acoustic measures were able to separate the different drug groups at outcome while the rating scales failed to show a difference. In a recent acoustic investigation by Wisniecki et al. have been successfully demonstrated measurable differences in cognitive behavior and motor slowing by comparing persons with NSZ to a control group. Speech pause behavior in a simple counting and picture description task have indicated that average pause length was indicating of motor retardation, whereas global measures of pause were indicative of the increased cognitive linguistic demands of the picture description task. [4] ________________________________________________________________________ Biomedical engineering 13 ________________________________________________________________________ 4. ACOUSTIC INVESTIGATIONS ON SCHIZOPHRENIA Based on a sample of 42 chronic schizophrenic patients and 42 carefully matched controls, Stassen et al. investigated potential relationships between acoustic variables on the one hand, and negative syndromes, positive syndromes and affective disturbances, on the other. A set of 12 acoustic variables automatically assessed in a standardized experimental setting allowed an almost perfect discrimination between schizophrenic patients and normal subjects. Acute side-effects of medication did not explain this finding. However, the question of whether the observed changes in speaking behavior and voice sound characteristics were caused by long-term neuroleptic treatment, for example, as a consequence of tardive dyskinesia, could not be answered by the investigation. In view of a biological validation of the negative-positive model of schizophrenia, the reliability of various psychopathological subscales was tested through repeated assessments at 14 day intervals. Stassen et al. found most psychopathology scores to be sufficiently stable and reproducible over time, thus representing a suitable basis for the estimation of severity with respect to the negative and positive component of schizophrenia. Using the first measurements as training samples and the second measurements 14 days later as test samples, discriminant analysis yielded conclusive proof of a close relationship between acoustic variables and the severity of the negative and positive component of schizophrenia. In particular, by means of "objective" acoustic variables and under the constraint of reproducibility, 75.9% of patients were correctly classified as low or high scorers with respect to the negative syndrome, 71.9% of patients with respect to the positive syndrome, and 79.4% of patients with respect to their depressive symptomatology. [1] To asses speaking behavior and voice sound characteristics following acoustic variables were used: (1) mean pause duration (only pauses >250ms are included) (2) number of pauses (only pauses >250ms are included) (3) mean pause duration per second ________________________________________________________________________ Biomedical engineering 14 ________________________________________________________________________ (4) mean utterance duration (5) total recording time (6) total length of pauses (only pauses >250ms are included) (7) total length of utterances (8) mean energy per second (9) variation of energy per second (dynamics) (10) mean energy per syllable (11) variation of energy pre syllable (dynamics) (12) mean vocal pitch (13) variation of mean vocal pitch (intonation) (14) F0-amplitude (intensity of mean vocal pitch relative to the intensity of overtones (timbre)) (15) F0-6dB-bandwidth (intonation) (16) F0-contour (timbre) All these variables have turned out to be highly stable over time and to be sensitive enough to distinguish between emotionally neutral and emotionally charged texts read out loud by the same individuals. All speech signals were inspected visually and marked with and artifact code if necessary so that disturbed intervals could be removed prior to data analysis. [5] Literature review, of speaking behavior of schizophrenic patients and acoustic investigation on schizophrenia, was presented in this and previous Chapter. Following Chapter will inform reader about data and methods used in developing of system capable discriminating between schizophrenic and non-schizophrenic subjects. ________________________________________________________________________ Biomedical engineering 15 ________________________________________________________________________ 5. DATA AND METHODS The system implementation is comprised of three main parts - data acquisition, feature extraction and classification. Audio Data Acquisition Feature extraction Classifier Decision Fig. 5.1 – System implementation 5.1. Audio data The audio sample included 33 subjects. Of these, 15 subjects represented schizophrenic patients under various medications and 18 subjects belonged to control group. The sample was comprised of 19 male subjects and 14 female subjects with a mean age of 39.6 years (SD 13.2 years, range 24-80 years). The average illness duration since diagnosis was 5.6 years (SD 5.4 years, range 1-17 years). All subjects from patients and control group were tested on EEG to confirm their condition or mental health. See Appendix A for patients and controls details. All audio files were recorded in a quiet room on a minidisc recorder with direct digital 16-bit sampling, at a sampling rate of 22 kHz. Although all effort was taken to keep constant microphone distance during acquisition, this was not the case in all recordings. Emotionally neutral text of around 3 min. length from a children’s book was selected for the recording, especially chosen for its verbal and semantic simplicity. [5] (See Appendix B) ________________________________________________________________________ Biomedical engineering 16 ________________________________________________________________________ 5.2. Acquisition All audio files were processed in GoldWave v.4.26 to reduce ambient noise, where silent part in each file was selected and copied to the clipboard. Than function Noise Reduction was applied on whole file based on data in the clipboard. Pauses longer than 250ms were removed at the beginning and the end of the file, so they do not affect the results. Further, all speech signals were inspected visually, so that disturbed parts could be removed before data analysis. Furthermore, normalization was performed on all speech signals so as to ensure equal conditions for energy measurement. This was necessary due to variations of distance between the speaker and the recording microphone. 5.3. Feature extraction Feature extraction was carried out using MATLAB executed processing algorithms. 5.3.1. Time statistics The recordings were analyzed, so only speech signals above a certain threshold were classified as voice segments. Only pauses greater than 250ms were included, whereof all segments below threshold lasting less than 250ms were added to voiced parts of signal. Following acoustic variables were extracted to obtain time statistics: (1) Mean Pause Duration (2) Number of Pauses (3) Mean Pause Duration per Second (4) Mean Utterance Duration (5) Total Recording Time ________________________________________________________________________ Biomedical engineering 17 ________________________________________________________________________ (6) Total Length of Pauses (7) Total Length of Utterances 5.3.2. Energy measures The energy of a discrete time signal is defined as def Ex    x ( n) 2 (5.1) n   A signal is called as energy signal if 0  Ex   (5.2) “Mean Energy per Second” and “Standard Deviation of Mean Energy per Second” were estimated for each speech signal. [6] 5.3.3. Vocal Pitch Estimation For the vocal pitch estimation the Variable Length Average Magnitude Difference Function [7] was used. This function provides fast and efficient pitch estimation algorithm which can be implemented in real-time. The main principle of AMDF is to find the pitch by comparing the similarity between the original signal and its shifted version. The VLAMDF algorithm uses variable-length speech samples and is defined as E VLAMDF ( )  1  1   s ( n)  s ( n   ) (5.3) n 0 where    min , min 1 ,..., max ________________________________________________________________________ Biomedical engineering 18 ________________________________________________________________________ The pitch is estimated according to the minimum of the waveform and is obtained by the equation  max  T pVLAMDF  arg MIN E VLAMDF ( )    min  (5.4) The speech recording was divided into 40ms frames. Then, based on certain conditions to eliminate unvoiced and speech onset or offset frames, appropriate frames were chosen and pitch was estimated in these frames. Further, “Mean Vocal Pitch” and “Standard Deviation of Mean Vocal Pitch” were estimated. Fig. 5.2 –Waveform of VLAMDF, 1(a) the original waveform, 1(b) the waveform of VLAMDF ________________________________________________________________________ Biomedical engineering 19 ________________________________________________________________________ 5.4. Classification Based on histograms of features, extracted from all speech signals for schizophrenic and control group, which show an approximate Gaussian distribution a classifier based on Linear Discriminant Analysis [8] was chosen to differentiate between control and schizophrenia audio samples. Cross-fold validation was used to estimate classification accuracies. 5.4.1. Linear Discriminant Analysis Training of the LD classifiers proceeded as follows. Let x be a column vector containing d feature values which is to be assigned to one of two classes. Assume there are N1 feature vectors available for training the classifier from class 1 and N2 feature vectors from class 2. The nth feature vector for training in class k is designated as x n( k ) . Training involves determining the class-conditional mean vectors 1 and  2 using 1  1 N1 (1)  xn N1 n1 2  1 N2 N2 x n 1 ( 2) n (5.5) (5.6) and the common covariance matrix ∑ using  2 Nk 1 ( xn( k )   k )( xn( k )   k )T  N1  N 2  2 k 1 n1 (5.7) The common covariance matrix is employed since within class covariance is similar and offers better representation for the limited normal training set. After determining the  k ’s and Σ from the training data, a feature vector x is classified by assuming values ________________________________________________________________________ Biomedical engineering 20 ________________________________________________________________________ for the prior probability of class 1,  1 (note that  2  1  1 ), and calculating the discriminant value, y using  1  1  y  ( 1   2 )T  1 x  ( 1   2 )T  1 ( 1   2 )  log  2 ( 1   ) 1   (5.8) The posterior probabilities for classes 1 and 2 are then calculated using exp( y ) exp( y )  1 (5.9) P(2 | x)  1  P(1 | x) (5.10) P(1 | x)  The final classification is obtained by choosing the class with the highest posterior probability estimate from (5.10). 5.4.2. Assessing classifier performance The cross-validation scheme [9] was used for estimating the classifier performance. The variance of the performance estimates was decreased by averaging results from multiple runs of cross validation where a different random split of the training data into folds is used for each run. In this study ten repetitions of fifteen-fold cross-validation were used to estimate classifier performance. For each run of cross fold validation the number of normal and abnormal cases were equal. Classifier performance was measured using sensitivity, specificity, positive predictivity, negative predictivity and the overall accuracy. These measures were calculated as per the definition of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) presented in Table 5.1. ________________________________________________________________________ Biomedical engineering 21 ________________________________________________________________________ Sensitivity  TP (TP  FN ) (5.11) TN  FP (5.12) Positive Predictivity  TP (5.13) Specificit y  TN (TP  FP) Negative Predictivity  TN (5.14) (TN  FN ) Overall Accuracy (TP  TN ) (5.15) (TP  TN  FP  FN ) TABLE 5.1 DEFINITIONS OF TRUE POSITIVES, TRUE NEGATIVES, FALSE POSITIVES AND FALSE NEGATIVES. True Classification Pathology PREDICTED CLASSIFICATION Pathology Normal Normal True Positive False Negative TP FN False Positive True Negative FP TN 5.4.3. Receiver Operator Characteristics (ROC) Receiver operator characteristic curves (ROC curves) provide valuable information on a test’s ability to discriminate between two classes over the complete spectrum of decision thresholds. The ROC curve is a graph of sensitivity vs. (100% - specificity) as the a-priori probabilities of the two classes are swept between zero and one. It provides information on clinical usefulness since it presents a trade-off in costs between false ________________________________________________________________________ Biomedical engineering 22 ________________________________________________________________________ positives and false negatives and can be used to decide the threshold for different clinical requirements e.g. screening vs. pre-surgical diagnosis. The area enclosed by the ROC plot is a metric against which other similar tests can be compared [10]. The area was computed using the trapezoidal rule. Fig. 5.3 –Interpretation of ROC curve This Chapter presented some of the voice features that can be automatically extracted from audio voice files, along with a pattern classification scheme. The next chapter will use these features to discriminate between patients and controls. ________________________________________________________________________ Biomedical engineering 23 ________________________________________________________________________ 6. RESULTS The values for all features, all patients and controls were input to the subject independent LDA classifier to assess their performance to distinguish between schizophrenic and non-schizophrenic subjects. In the ROC graphs below, the Sensitivity, the Specificity, the Positive Predictivity, the Negative Predictivity, the Overall Accuracy and the Area Under the ROC Curve are shown for all features individually and for the combination of all features. The reader is reminded that classifier performance is not dependent on Overall Accuracy but on a combination of all these classifier metrics. ________________________________________________________________________ Biomedical engineering 24 ________________________________________________________________________ 6.1. Number of Pauses Classifier performance: 1) Sensitivity = 66.67% 2) Specificity = 46.67% 3) Positive Predictivity = 55.56% 4) Negative Predictivity = 58.33% 5) Overall Accuracy = 56.67% 6) Area Under the ROC Curve = 0.591 Fig. 6.1 – ROC curve for feature “Number of Pauses” ________________________________________________________________________ Biomedical engineering 25 ________________________________________________________________________ 6.2. Mean Pause Duration Classifier performance: 1) Sensitivity = 60.00% 2) Specificity = 53.33% 3) Positive Predictivity = 56.25% 4) Negative Predictivity = 57.14% 5) Overall Accuracy = 56.67% 6) Area Under the ROC Curve = 0.596 Fig. 6.2 – ROC curve for feature “Mean Pause Duration” ________________________________________________________________________ Biomedical engineering 26 ________________________________________________________________________ 6.3. Mean Pause Duration per Second Classifier performance: 1) Sensitivity = 73.33% 2) Specificity = 60.00% 3) Positive Predictivity = 64.71% 4) Negative Predictivity = 69.23% 5) Overall Accuracy = 66.67% 6) Area Under the ROC Curve = 0.622 Fig. 6.3 – ROC curve for feature “Mean Pause Duration per Second” ________________________________________________________________________ Biomedical engineering 27 ________________________________________________________________________ 6.4. Mean Utterance Duration Classifier performance: 1) Sensitivity = 40.00% 2) Specificity = 80.00% 3) Positive Predictivity = 66.67% 4) Negative Predictivity = 57.14% 5) Overall Accuracy = 60.00% 6) Area Under the ROC Curve = 0.631 Fig. 6.4 – ROC curve for feature “Mean Utterance Duration” ________________________________________________________________________ Biomedical engineering 28 ________________________________________________________________________ 6.5. Total Recording Time Classifier performance: 1) Sensitivity = 60.00% 2) Specificity = 86.67% 3) Positive Predictivity = 81.82% 4) Negative Predictivity = 68.42% 5) Overall Accuracy = 73.33% 6) Area Under the ROC Curve = 0.764 Fig. 6.5 – ROC curve for feature “Total Recording Time” ________________________________________________________________________ Biomedical engineering 29 ________________________________________________________________________ 6.6. Total Length of Pauses Classifier performance: 1) Sensitivity = 66.67% 2) Specificity = 53.33% 3) Positive Predictivity = 58.82% 4) Negative Predictivity = 61.54% 5) Overall Accuracy = 60.00% 6) Area Under the ROC Curve = 0.533 Fig. 6.6 – ROC curve for feature “Total Length of Pauses” ________________________________________________________________________ Biomedical engineering 30 ________________________________________________________________________ 6.7. Total Length of Utterances Classifier performance: 1) Sensitivity = 80.00% 2) Specificity = 80.00% 3) Positive Predictivity = 80.00% 4) Negative Predictivity = 80.00% 5) Overall Accuracy = 80.00% 6) Area Under the ROC Curve = 0.836 Fig. 6.7 – ROC curve for feature “Total Length of Utterances” ________________________________________________________________________ Biomedical engineering 31 ________________________________________________________________________ 6.8. Mean Energy per Second Classifier performance: 1) Sensitivity = 53.33% 2) Specificity = 100.00% 3) Positive Predictivity =100.00% 4) Negative Predictivity = 68.18% 5) Overall Accuracy = 76.67% 6) Area Under the ROC Curve = 0.871 Fig. 6.8 – ROC curve for feature “Mean Energy per Second” ________________________________________________________________________ Biomedical engineering 32 ________________________________________________________________________ 6.9. Standard Deviation of Mean Energy per Second Classifier performance: 1) Sensitivity = 53.33% 2) Specificity = 100.00% 3) Positive Predictivity = 100.00% 4) Negative Predictivity = 68.18% 5) Overall Accuracy = 76.67% 6) Area Under the ROC Curve = 0.876 Fig. 6.9 – ROC curve for feature “Standard Deviation of Mean Energy per Second” ________________________________________________________________________ Biomedical engineering 33 ________________________________________________________________________ 6.10. Mean Vocal Pitch Classifier performance: 1) Sensitivity = 40.00% 2) Specificity = 46.67% 3) Positive Predictivity = 42.86% 4) Negative Predictivity = 43.75% 5) Overall Accuracy = 43.33% 6) Area Under the ROC Curve = 0.453 Fig. 6.10 – ROC curve for feature “Mean Vocal Pitch” ________________________________________________________________________ Biomedical engineering 34 ________________________________________________________________________ 6.11. Standard Deviation of Mean Vocal Pitch Classifier performance: 1) Sensitivity = 66.67% 2) Specificity = 53.33% 3) Positive Predictivity = 58.82% 4) Negative Predictivity = 61.54% 5) Overall Accuracy = 60.00% 6) Area Under the ROC Curve = 0.622 Fig. 6.11 – ROC curve for feature “Standard Deviation of Mean Vocal Pitch” ________________________________________________________________________ Biomedical engineering 35 ________________________________________________________________________ 6.12. Performance of classifier for all features combined Classifier performance: 1) Sensitivity = 86.67% 2) Specificity = 93.33% 3) Positive Predictivity = 92.86% 4) Negative Predictivity = 87.50% 5) Overall Accuracy = 90.00% 6) Area Under the ROC Curve = 0.920 Fig. 6.12 – ROC curve for all features combined ________________________________________________________________________ Biomedical engineering 36 ________________________________________________________________________ From the time statistics features, the best performance was demonstrated by the feature “Total Length of Utterances” with the Overall Accuracy of 80% and the Area Under the ROC Curve equal 0.836. This was followed by “Total Recording Time” with the Overall Accuracy of 73.3% and the Area Under the ROC Curve equal 0.764. For all other features, the Overall Accuracy was proximately 60% and the Area Under the ROC Curve ranged from 0.53 to 0.63. This Chapter presented the results of the classifier performance individually for all extracted features and for all features combined. Next Chapter will discuss further possibilities of improvement of here presented system. ________________________________________________________________________ Biomedical engineering 37 ________________________________________________________________________ 7. DISCUSSION This work presented a system for discrimination between schizophrenic and nonschizophrenic subjects based on audio analysis of speech signals. Based on the results, it can be seen that schizophrenic patients speak slowly and produce longer words in comparison to healthy subjects This explains why the features “Total Length of Utterances” and “Total Recording Time” provided one of the highest performance in discrimination between schizophrenic and non-schizophrenic subjects. Features “Mean Energy per Second” which represents the average value of energy in each second of the speech signal and “Standard Deviation of Mean Energy per Second”, representing the standard deviation of previous feature for whole speech signal and dynamics of the speech, both performed well with the Overall Accuracy of 76.7% and the Area Under the ROC Curve equal 0.871 and 0.876. The increase of energy in patients’ voices could be a side-effect of medication, as it has been reported that patients under sedation tend to speak louder and to produce utterances with a greater variation of loudness. [1] The feature “Mean Vocal Pitch”, representing the average value of vocal pitch measured in each second of the recording, did not perform satisfactorily with the Overall Accuracy of 43.3% and the Area Under the ROC Curve equal 0.453. “Standard Deviation of Mean Vocal Pitch”, which reflects subjects’ intonation, showed better performance than the previous feature with the Overall Accuracy of 60% and the area under the curve equal 0.622. Classifier performance for all features combined was represented by the Overall Accuracy of 90% and the Area Under the ROC Curve equal 0.920, which allowed very good discrimination between schizophrenic patients and the control group. ________________________________________________________________________ Biomedical engineering 38 ________________________________________________________________________ 7.1. New Features New features should be explored, which, when combined with the actual measured time statistics, energy and pitch features may increase system performance. In the study of Stassen et al. [1], the “F0-amplitude” and “F0-contour” features had the highest contribution on overall performance of their discriminant function and demonstrated that the patients’ voice sound characteristics were primarily distinctive in comparison with healthy subjects. 7.2. A larger database A larger database of schizophrenic patients’ and healthy controls’ speech signals, containing various types of schizophrenia and greater number of samples, would contribute to a more robust classification system. 7.3. Full automation and remote diagnostic Remote diagnostic and full automated system study of vocal fold pathologies assessment, as presented in Moran et al. [11], could easily be developed. In this way, high quality assessments could be carried out remotely, resulting in the use of available medical resources in an efficient and effective manner, as well as providing psychiatrists with up to date results with minimal processing. 7.4. Real-time implementation Performance of the algorithm to extract features was assessed. Time statistics for the calculation of “Number of Pauses”, “Mean Pause Duration”, “Mean Pause Duration per Second”, “Mean utterance Duration”, “Total Recording Time”, “Total Length of ________________________________________________________________________ Biomedical engineering 39 ________________________________________________________________________ Pauses” and “Total Length of Utterances” was assessed. The analysis of a audio file with sampling frequency of 11 kHz and of 3 minutes in duration was found to take 30 minutes to process on computer with 2GHz AMD Turion64 CPU and 1GB of RAM. This may not be satisfactory for clinical use. This temporal performance metric was also carried out on resampled, downsampled and decimated version of the original file. Based on results downsampling the original file showed the best results with negligible difference in the feature values compared to the original. Following downsampling to 3675 Hz, the feature extraction time reduced to approximately 8 minutes. Further downsampling to 2756 Hz reduced the processing time to 6 minutes. Further downsampling to 2205 Hz resulted in processing time of 4.5 minutes, again with negligible differences. With use of more a powerful computer the time needed for analysis maybe shortened and the analysis possibly real-time. The processes of resampling and decimating yielded poor quality data with respect to pattern classification and resulted in poor discrimination perfromance. See Appendix C for examples of this processing. 7.5. Longitudinal study Regular (monthly) voice recording could be established to investigate if the features extracted in this study are stable over time. Later, when the effects of medication on schizophrenic patients’ voice could be investigated, regular voice recording could serve as a control of prescribed medication. ________________________________________________________________________ Biomedical engineering 40 ________________________________________________________________________ 8. CONCLUSION In this study, the possibility to use audio analysis as a basis for objective diagnosis in psychiatry was presented by building a system capable of discriminating between schizophrenic patients and healthy control subjects. Very good discrimination with accuracy of 90% was achieved. An increase of number of specific extracted features may yield even better results. Further optimalization of feature extraction and implementation of a remote and fully automated system capable of real-time analysis may reduce the cost currently associated with assessment of schizophrenia and also help psychiatrists with objective measures on the efficiency of the medication they prescribe to their patients. ________________________________________________________________________ Biomedical engineering 41 ________________________________________________________________________ BIBLIOGRAPHY [1] Stassen H. H., Alber M., Püschel J., Scharfetter Ch., Tewesmeier M. & Woggon B. (1995). Speaking behavior and voice sound characteristics associated with negative schizophrenia. Journal of Psychiatric Research 1995 Jul-Aug;29(4):277-96 [2] P.R.Lott, S. Guggenbuhl, A.Schneeberger, A.E.Pulvar H.H. Stassen, “Linguistic Analysis of the Speech Output of Schizophrenic, Bipolar and Depressive Patients”, Psychopathology, No. 35, pp 220-227, 2002 [3] Covington M. A., He C., Brown C., Naci L., McClain J. T., Fjordbak B. S., Semple J., Brown J. (2005). Schizophrenia and the structure of language: linguist’s view. Schizophrenia Research 2005 Sep 1, 77(1):85-98 [4] Wisniecki A., Cannizzaro, M., Cohen, H., & Snyder, P. (2005). Speech Impairments in Neurodegenerative Diseases/Psychiatric Illnesses. In K. Brown (Ed.), Encyclopedia of Language and Linguistics, 2nd edition. Oxford : Elsevier. pp. 358-364 [5] Stassen H. H., Kuny S., Hell D. (1998). The speech analysis approach to determining onset of improvement under antidepressants. European Neuropsychopharmacology 1998 Dec; 8(4):303-10 [6] Deller J. R., Proakis J. G., Hansen J. H. L., Discrete-Time Processing of Speech Signals. Prentice-Hall, Upper Saddle River, New Jersey 07458, 1993 [7] X. D. Mei, J. Pan, S. H. Sun (2001). Efficient algorithms for speech pitch estimation. Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. Hong Kong, pp. 421-424. [8] Duda R. O., Hart P. E., Stork H. G., Pattern Classification. New York: WileyInterscience, 2000 ________________________________________________________________________ Biomedical engineering 42 ________________________________________________________________________ [9] R. Kohavi, “A study of cross validation and bootstrap for accuracy estimation and model selection,” Proc. 14th Int. Conf on Art. Intel., pp. 1137-1143, 1995. [10] M.H. Zweig, G. Campbell, “Receiver Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine”, J. Clinical Chemistry, Vol. 39, No. 4, 1993. [11] Moran R. J., Reilly R. B., de Chazal P., Lacy P. D. (2006). Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis. IEEE Transactions on Biomedical Engineering, Vol. 53, No. 3, March 2006 ________________________________________________________________________

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download TABLE OF CONTENTS