Download TABLE OF CONTENTS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

E. Fuller Torrey wikipedia , lookup

Mechanisms of schizophrenia wikipedia , lookup

Transcript
Biomedical engineering
1
________________________________________________________________________
1. INTRODUCTION
Schizophrenia is a chronic, severe, and disabling brain disease. Approximately 1%
of the population develops schizophrenia during their lifetime. While schizophrenia is
found worldwide, more than 2 million Americans suffer from the illness in a given year.
The severity of the symptoms and long-lasting, chronic pattern of schizophrenia often
cause a high degree of disability. Although schizophrenia affects men and women with
equal frequency, the disorder often appears earlier in men, usually in the late teens or
early twenties, than in women, who are generally affected in the twenties to early thirties.
People with schizophrenia often suffer a range of symptoms. Their speech and behavior
can become disorganized that they may be incomprehensible to others. Available
treatments can relieve many symptoms, but most people with schizophrenia continue to
suffer some symptoms throughout their lives; it has been estimated that no more than one
in five individuals recovers completely. Research is gradually leading to new and safer
medications through better understanding of the complex causes of the disease.
Approaches include the study of molecular genetics while new imaging methods of the
brain’s structure and function hold the promise of new insights into the disorder.
The first signs of schizophrenia often appear as confusing changes in behavior. The
sudden onset of severe psychotic symptoms is referred to as an “acute” phase of
schizophrenia. Less obvious symptoms, such as social isolation or withdrawal, or unusual
speech, thinking, or behavior, may precede, be seen along with, or follow the more
obvious psychotic symptoms.
CURRENT METHOD OF DIAGNOSIS
In 2004 the European Brain Council estimated that in Ireland the cost of schizophrenia was
€37million of a total brain disorders cost of €260million. Currently t here are no objective
biomarkers for the diagnosis of schizophrenia. Mental health professionals currently use
the presence of specific symptom clusters during clinical interviews as a means of making
a diagnosis of psychiatric disorders. People with schizophrenia often show “blunted” or
“flat” affect. This refers to a severe reduction in emotional expressiveness. A person with
schizophrenia may not show the signs of normal emotion, perhaps may speak in a
monotonous voice, have diminished facial expressions, and appear extremely apathetic.
________________________________________________________________________
Biomedical engineering
2
________________________________________________________________________
The person may withdraw socially, avoiding contact with others; and when forced to
interact, he or she may have nothing to say, reflecting “impoverished thought.”
Motivation can be greatly decreased, as can interest in or enjoyment of life. These
problems with emotional expression and motivation are symptoms of schizophrenia.
________________________________________________________________________
Biomedical engineering
3
________________________________________________________________________
2. AIM
The aim is to develop a new approach to diagnose and to disease manage serious
mental illnesses (SMI) such as schizophrenia using the acoustic analysis of speech. The
spontaneous speech of schizophrenia patients with negative symptoms (e.g. flat affect,
loss of interest) contains many hesitations and pauses. Acoustic analysis demonstrates
that the length of pauses, independent of other linguistic or paralinguistic parameters, is
strongly associated with the clinician's opinion of the patient's negative symptoms. [1, 2]
Employing automatic acoustic analysis will result in the development of specific and
sensitive acoustic diagnostic biomarkers for schizophrenia.
The ability to provide acoustic diagnosis would see significant advances in the
following areas:

Advanced Medical Diagnostics. An automation of patient diagnosis would reduce
with the need to wait for appointments with either clinic or hospital staff. With the
availability of acoustic signal processing on easily acquired speech signals, high
quality assessments can be carried out remotely, providing diagnostic prescreening of patients with a variety of psychiatric disorders. This will result in
prioritizing those in need of the most immediate attention resulting in the use of
available medical resources in an efficient and effective manner.

Advanced Personalized Care through Disease Management – Comparison of a
client’s physiological data automatically generated from biomedical signal
processing with local and international databases will increase the disease
management capacities of primary care physicians and specialists. Such
information processing will enhance treatment adherence and allow continuous
monitoring of a variety of health variables remotely.
Anticipated benefits of this acoustic diagnosis approach include:

Provide critical diagnostic information objectively, efficiently, cost-effectively
and non-invasively

Reduce the time spent and the need for medical personnel to make diagnoses

Manage mental health professional’s time efficiently
________________________________________________________________________
Biomedical engineering
4
________________________________________________________________________

Reduce travel times and costs for patients

Provide medical specialists with patients specific audio signal interpretations

Empower patients to become proactive participants in their health management.
In this thesis, Chapter 3 overviews the literature in voice characteristics and their
important in psychiatric diagnosis. Chapter 4 will give a brief look in prior acoustic
investigations on schizophrenia. In Chapter 5 the audio database is presented and the
methods used in the development of the classifier discussed. The classifier performance
is presented in Chapter 6. Following in Chapter 7 is a discussion of the findings.
Chapter 8 draws conclusion from this study.
________________________________________________________________________
Biomedical engineering
5
________________________________________________________________________
3. SPEAKING BEHAVIOR AND VOICE SOUND
CHARACTERISTICS OF SCHIZOPHRENICS PATIENTS
Symptoms such as affective flattening, blunted affect, the inability to experience or
express a normal range of affective responses, emotional dullness, pervasive apathy,
poverty of speech and psychomotor retardation have been constituents of clinical
definitions of schizophrenia throughout this century and have led to the negative-positive
model of schizophrenia. In this model, delusions, hallucinations and florid formal thought
disorders are conceptualized as pathological excesses, whereas negative symptoms are
conceptualized as pathological deficits.
Based on this heuristic division, negative symptoms have been theorized to be
unresponsive to antipsychotic medication, to be associated with a deteriorating course,
and to reflect some structural brain abnormalities that would mediate the hypothesized
association between negative symptoms and poor prognosis.
Recent findings, however, have indicated that positive symptoms are not uniformly
responsive to treatment, and that negative symptoms are not universally immutable.
Specifically, during recovery from an acute episode of schizophrenia, the key symptoms
tend to disappear first while symptoms at lower levels persist longer. Non-specific and
affective symptoms tend to recur as prodromal symptoms of relapse. Furthermore, several
authors have reported a diagnostic non-specificity of negative symptoms, particularly
with regard to various clinical forms of depression which occur relatively commonly in
schizophrenic patients and there is also some overlap between the specific phenomena of
depressive illness and negative symptoms in schizophrenia. [1]
3.1. Formal thought disorder
In the psychiatric literature, many of the abnormalities of language in schizophrenia
are lumped together as formal thought disorder (a disorder in the form of thought, not the
content).
________________________________________________________________________
Biomedical engineering
6
________________________________________________________________________
Perhaps most commonly it is the moment-to-moment, logical sequencing of ideas
which is at fault. At other times, the mechanisms of language production themselves
appear to be disturbed, so that the meaning of individual words and phrases is obscured.
At still other times, the fault seems to be at the level of discourse: individual words,
sentences, and sequences of thought make sense, but there is no discernible thread to
longer verbal productions.
Florid formal thought disorder is a relatively uncommon finding in acute
schizophrenia, though it is somewhat more common in chronic cases. Manifestations of
formal thought disorder include poverty of content (failure to express sufficient
information), loss of goal (slippage away from the intended topic), clanging (chaining
together similar sounding words as if distracted by them), and other kinds of incoherence
and unintelligibility.
3.2. First study of linguistic structure
The study of schizophrenic language disorder by linguists began with Chaika
(1974), who studied a single patient who spoke normally for weeks at a time, her deviant
language coinciding with what her psychiatrists term psychotic episodes. Stripped of
some mid-1970s theoretical terminology, and condensed somewhat, the abnormalities
that Chaika observed were:
(1) Failure to utter the intended lexical item;
(2) Distraction by the sounds or sense of words, so that a discourse becomes a
string of word associations rather than a presentation of previously intended
information;
(3) Breakdown of syntax and/or discourse;
(4) Lack of awareness that the utterances are abnormal.
Of these, (2) is most characteristic of schizophrenia; (1) and (3) resemble ordinary
speech errors, and (4) resembles some forms of aphasia.
________________________________________________________________________
Biomedical engineering
7
________________________________________________________________________
3.3. Schizophrenic language vs. normal speech errors
Fromkin responded that except for the disruption of discourse which can be
attributed to non-linguistic factors, all the features [of schizophrenic language] are
prevalent in normal speech as exemplified by speech errors and “slips of the tongue”
Mistaken lexical choices and minor scramblings of syntax are common in everyday
speech. Indeed, speech errors are often triggered by the sounds or senses of recently
uttered words, and speakers are commonly unaware of their fumbles.
Thus, (1), (2), (3), and (4) of Chaika’s core abnormalities are all disposed of, except
for derailment of discourse, which Fromkin considers extra linguistic. This claim has not
held up. Although there are obvious similarities between Chaika’s samples of
schizophrenic language and Fromkin’s corpus of speech errors, there are also obvious
differences. Normal speakers make occasional errors like those seen in schizophrenia, but
not whole strings of errors. A representative patch of gibberish from Chaika’s patient
comprised 9 syllables, and uncorrected speech errors of such length and unintelligibility
do not occur in normal speech. What’s more, normal speakers, when an error is pointed
out, immediately correct it; speakers with schizophrenia do not.
Moreover, Chaika’s patient would commonly string together 10 or 20 sentences
connected, as far as one can see, only by word associations; ordinary people, even when
plagued by speech errors, do not do this. Normal speech errors are momentary deviations
from a discourse plan that is immediately resumed.
3.4. Loss of voluntary control
In later work, Chaika argues that schizophrenic language disorder is fundamentally
a loss of voluntary control over the speech generation process. Indeed, according to
Chapman, patients sometimes say in retrospect that this is exactly what happened—they
couldn’t control their speech. This echoes the main theme of the Schneiderian first-rank
________________________________________________________________________
Biomedical engineering
8
________________________________________________________________________
symptoms, which is loss of control over the train of thought. Note however that Chaika’s
original patient apparently lacked such insight.
Chaika argues that loss of voluntary control ties together a wide range of observed
phenomena, depending on which part of language production goes out of control—most
often discourse organization, but often lexical retrieval, and sometimes pronunciation or
syntax. It fits well into a more general conception of schizophrenia as degradation of
communication between mental subsystems.
3.5. Schizophrenic language disorders vs. aphasia
How much do the language disturbances of schizophrenia resemble the aphasia
caused by stroke, traumatic brain injury, or neurological conditions such as epilepsy?
Researchers agree that there are important differences, but beyond that, discussion
of the issue has been complicated by the heterogeneity of both schizophrenia and aphasia.
Aphasia-like symptoms are episodically observed in only a small proportion of subjects
considered to be schizophrenics whereas the aphasia produced by stroke or brain injury is
in most cases constantly present. Patients with aphasia have normal thoughts and express
them with difficulty; those with schizophrenia have unusual thoughts (or disorganized
discourse plans) and express them with comparative ease.
Pinard and Lecours compare schizophrenic language to Wernicke’s aphasia
(including jargon aphasia), a disorder in which the patient speaks fluently but
unintelligibly. Their main findings:
(1) Schizophasic discourse often has a preferred theme or preoccupation; aphasic
discourse rarely does.
(2) Speakers with schizophrenia often jump from one subject to another based on
the sounds or associations of words they have uttered (association chaining or
glossomania). This is seldom observed in jargon aphasia; it requires lexical
mastery well beyond that of most aphasics, as well as remarkable control of
prosody.
________________________________________________________________________
Biomedical engineering
9
________________________________________________________________________
(3) Schizophasic discourse often includes rare words, evidence of a large, intact
vocabulary; jargon aphasia, even when very fluent, shows a restricted
vocabulary.
(4) Schizophrenic speech can include conscious creation of new words (neologisms)
and consciously constrained discourse in which the speaker is well aware that
the speech is unusual, whether or not others can understand it. Aphasic speakers
who produce fluent unintelligible discourse do not seem to be fully aware of
what they are doing, and if they create new words, it is as if by accident
3.6. Andreasen’s 18-point scale
The standard account of schizophrenic language today is that of Andreasen, whose
Thought, Language, and Communication (TLC) scale provided a foundation for
subsequent research and clinical practice.
The scale comprises 18 symptoms:
(1) poverty of speech
(2) poverty of content (wordy vagueness)
(3) pressure of speech excessive speed or emphasis)
(4) distractibility (by stimuli in the environment)
(5) tangentiality (partly irrelevant replies)
(6) loss of goal
(7) derailment (loss of goal in gradual steps)
(8) circumstantiality (numerous digressions on the way to the goal)
(9) illogicality
(10) incoherence (“word salad”, severely disrupted structure)
(11) neologisms (novel made-up words)
(12) word approximations (coined substitutes for existing words)
(13) stilted speech (pompous or overly formal style)
(14) clanging
(15) perseveration
(16) echolalia
________________________________________________________________________
Biomedical engineering
10
________________________________________________________________________
(17) blocking (sudden stoppage)
(18) self-reference (talking about oneself excessively)
3.7. Liddle’s TLI
Liddle et al. simplified Andreasen’s TLC index into a Thought and Language Index
comprising 8 symptoms, which factor analysis divided into 3 groups with strong, cleanly
separate factor loadings.
They found that the disorganization and impoverishment symptom groups were
approximately orthogonal rather than bipolar (neither one is the opposite of the other;
they can coexist in any combination). Perseveration and distractibility correlated with
each other but were independent of both impoverishment and disorganization.
Another noteworthy result was that the scores in the healthy control group were not
negligible, i.e., the abnormalities were found, in mild form, in non-patients.
The triad of Liddle et al. resembles other classifications of symptoms of
schizophrenia going back all the way to the 19th-century division into catatonia (negative
symptoms), paranoia (odd thoughts), and hebephrenia (disorganization).
3.8. Chen’s CLANG scale
In a paper that deserves to be better known, Chen et al. present an alternative to
Andreasen’s TLC scale and its derivatives. Their CLANG (Clinical Language) scale
comprises 17 symptoms classified according to levels of linguistic structure. Ranging
beyond Andreasen’s and Liddle’s scales, CLANG includes disturbances of fluency, voice
quality, and articulation. It is a fuller evaluation of speech, not just “thought” or
discourse.
________________________________________________________________________
Biomedical engineering
11
________________________________________________________________________
The factor analysis by Chen et al. found three major kinds of language dysfunction
in schizophrenic patients, “syntactic,” “semantic,” and “production.”
“Syntactic” dysfunction affects the structure of language on all levels, including
lexical access.
“Semantic” dysfunction affects the ability to map thoughts onto language and
pursue a communicative goal; it corresponds closely to the traditional definition of
thought disorder.
“Production” dysfunction comprises poverty of speech, lack of details, and lack of
intonation; it is associated with the negative symptoms of schizophrenia.
There are also factors for “pressure” and “prosody” (each cleanly self-contained,
even though “prosody” is only one symptom) and two symptoms, dysarthria and
excessive details, that are unclassified because they have minor loadings in several
factors.
Ceccherini-Nelli and Crow got a different factor analysis in which the “syntactic”
and “semantic” factors fell together into one, perhaps because of different scoring criteria.
Their second factor, “poverty,” corresponds closely to the “production” of Chen et al.
Their third factor, “excess,” comprises syntactic perseveration (excess syntactic
constraints) and excessive detail (interpretable as semantic perseveration). [3]
3.9. Acoustic Speech Characteristics
Predominantly negative-symptom schizophrenia (NSZ) can be characterized by a
number of atypical reductions in observed behavior, including communication behavior.
Classically observed outward manifestations in spoken (e.g., linguistic production) and
pragmatic/paralinguistic (e.g., body language, intonation, and prosody) aspects of
communication are related to affective flattening, anhedonia, alogia, and avolition (the
Diagnostic and Statistical Manual of Mental Disorders summarized characterization of
NSZ), implying an effect on both affect and cognition. Atypical communication can have
an effect on clinical assessment, as it may influence rater scoring on a number of items
from subjective behavioral rating scales (e.g., PANSS) including items such as blunted
________________________________________________________________________
Biomedical engineering
12
________________________________________________________________________
affect, emotional withdrawal, poor rapport, social withdrawal, reduced flow of
conversation, and motor retardation. Although these rating scales are invaluable in their
ability to help clinicians assess symptomatology, the addition of objective and
quantifiable measures of disease severity and therapeutic treatment response are desirable
and possible through speech and voice acoustic measurement.
In the most basic terms, physical quantitative measurements of communication
behavior, using aspects of frequency, intensity, and time, support and extend the clinical
impressions of atypical communication used clinically. This adds clinical value by
providing repeatable quantification of observed symptomatology.
The literature surrounding acoustic investigations in persons with schizophrenia has
revealed a number of consistent themes. Flat affect, alogia, and asociality (as measured by
the PANSS scale) are strongly related to restricted speech output, monotonous speech,
pause in speech, energy variation, utterance duration, and inflection. In addition, acoustic
measures have shown great promise in identifying treatment response by demonstrating a
larger treatment effect than those seen with traditional rating scales. It has also been
demonstrated that specific measures of acoustic inflection are sensitive enough to
differentiate
between
antipsychotic
compounds
(Olanzapine
vs.
Haloperidol;
Remoxipride vs. Haloperidol), whereas rating scales were not able to detect this
difference. Bidirectional changes in the speech acoustic characteristics between drug
conditions have also led researchers to the conclusion that different mechanisms of drug
action that may be at work, though rating scales were not able to make this distinction.
Acoustic measures were able to separate the different drug groups at outcome while the
rating scales failed to show a difference.
In a recent acoustic investigation by Wisniecki et al. have been successfully
demonstrated measurable differences in cognitive behavior and motor slowing by
comparing persons with NSZ to a control group. Speech pause behavior in a simple
counting and picture description task have indicated that average pause length was
indicating of motor retardation, whereas global measures of pause were indicative of the
increased cognitive linguistic demands of the picture description task. [4]
________________________________________________________________________
Biomedical engineering
13
________________________________________________________________________
4. ACOUSTIC INVESTIGATIONS ON SCHIZOPHRENIA
Based on a sample of 42 chronic schizophrenic patients and 42 carefully matched
controls, Stassen et al. investigated potential relationships between acoustic variables on
the one hand, and negative syndromes, positive syndromes and affective disturbances, on
the other. A set of 12 acoustic variables automatically assessed in a standardized
experimental setting allowed an almost perfect discrimination between schizophrenic
patients and normal subjects. Acute side-effects of medication did not explain this
finding. However, the question of whether the observed changes in speaking behavior and
voice sound characteristics were caused by long-term neuroleptic treatment, for example,
as a consequence of tardive dyskinesia, could not be answered by the investigation. In
view of a biological validation of the negative-positive model of schizophrenia, the
reliability of various psychopathological subscales was tested through repeated
assessments at 14 day intervals.
Stassen et al. found most psychopathology scores to be sufficiently stable and
reproducible over time, thus representing a suitable basis for the estimation of severity
with respect to the negative and positive component of schizophrenia. Using the first
measurements as training samples and the second measurements 14 days later as test
samples, discriminant analysis yielded conclusive proof of a close relationship between
acoustic variables and the severity of the negative and positive component of
schizophrenia. In particular, by means of "objective" acoustic variables and under the
constraint of reproducibility, 75.9% of patients were correctly classified as low or high
scorers with respect to the negative syndrome, 71.9% of patients with respect to the
positive syndrome, and 79.4% of patients with respect to their depressive
symptomatology. [1]
To asses speaking behavior and voice sound characteristics following acoustic
variables were used:
(1) mean pause duration (only pauses >250ms are included)
(2) number of pauses (only pauses >250ms are included)
(3) mean pause duration per second
________________________________________________________________________
Biomedical engineering
14
________________________________________________________________________
(4) mean utterance duration
(5) total recording time
(6) total length of pauses (only pauses >250ms are included)
(7) total length of utterances
(8) mean energy per second
(9) variation of energy per second (dynamics)
(10) mean energy per syllable
(11) variation of energy pre syllable (dynamics)
(12) mean vocal pitch
(13) variation of mean vocal pitch (intonation)
(14) F0-amplitude (intensity of mean vocal pitch relative to the intensity of
overtones (timbre))
(15) F0-6dB-bandwidth (intonation)
(16) F0-contour (timbre)
All these variables have turned out to be highly stable over time and to be sensitive
enough to distinguish between emotionally neutral and emotionally charged texts read out
loud by the same individuals.
All speech signals were inspected visually and marked with and artifact code if
necessary so that disturbed intervals could be removed prior to data analysis. [5]
Literature review, of speaking behavior of schizophrenic patients and acoustic
investigation on schizophrenia, was presented in this and previous Chapter. Following
Chapter will inform reader about data and methods used in developing of system capable
discriminating between schizophrenic and non-schizophrenic subjects.
________________________________________________________________________
Biomedical engineering
15
________________________________________________________________________
5. DATA AND METHODS
The system implementation is comprised of three main parts - data acquisition,
feature extraction and classification.
Audio Data
Acquisition
Feature
extraction
Classifier
Decision
Fig. 5.1 – System implementation
5.1. Audio data
The audio sample included 33 subjects. Of these, 15 subjects represented
schizophrenic patients under various medications and 18 subjects belonged to control
group. The sample was comprised of 19 male subjects and 14 female subjects with a
mean age of 39.6 years (SD 13.2 years, range 24-80 years). The average illness duration
since diagnosis was 5.6 years (SD 5.4 years, range 1-17 years). All subjects from patients
and control group were tested on EEG to confirm their condition or mental health. See
Appendix A for patients and controls details.
All audio files were recorded in a quiet room on a minidisc recorder with direct
digital 16-bit sampling, at a sampling rate of 22 kHz. Although all effort was taken to
keep constant microphone distance during acquisition, this was not the case in all
recordings.
Emotionally neutral text of around 3 min. length from a children’s book was
selected for the recording, especially chosen for its verbal and semantic simplicity. [5]
(See Appendix B)
________________________________________________________________________
Biomedical engineering
16
________________________________________________________________________
5.2. Acquisition
All audio files were processed in GoldWave v.4.26 to reduce ambient noise, where
silent part in each file was selected and copied to the clipboard. Than function Noise
Reduction was applied on whole file based on data in the clipboard. Pauses longer than
250ms were removed at the beginning and the end of the file, so they do not affect the
results. Further, all speech signals were inspected visually, so that disturbed parts could
be removed before data analysis.
Furthermore, normalization was performed on all speech signals so as to ensure
equal conditions for energy measurement. This was necessary due to variations of
distance between the speaker and the recording microphone.
5.3. Feature extraction
Feature extraction was carried out using MATLAB executed processing algorithms.
5.3.1. Time statistics
The recordings were analyzed, so only speech signals above a certain threshold
were classified as voice segments. Only pauses greater than 250ms were included,
whereof all segments below threshold lasting less than 250ms were added to voiced
parts of signal.
Following acoustic variables were extracted to obtain time statistics:
(1) Mean Pause Duration
(2) Number of Pauses
(3) Mean Pause Duration per Second
(4) Mean Utterance Duration
(5) Total Recording Time
________________________________________________________________________
Biomedical engineering
17
________________________________________________________________________
(6) Total Length of Pauses
(7) Total Length of Utterances
5.3.2. Energy measures
The energy of a discrete time signal is defined as
def
Ex 

 x ( n)
2
(5.1)
n  
A signal is called as energy signal if
0  Ex  
(5.2)
“Mean Energy per Second” and “Standard Deviation of Mean Energy per Second”
were estimated for each speech signal. [6]
5.3.3. Vocal Pitch Estimation
For the vocal pitch estimation the Variable Length Average Magnitude Difference
Function [7] was used. This function provides fast and efficient pitch estimation
algorithm which can be implemented in real-time. The main principle of AMDF is to
find the pitch by comparing the similarity between the original signal and its shifted
version. The VLAMDF algorithm uses variable-length speech samples and is defined as
E VLAMDF ( ) 
1  1

 s ( n)  s ( n   )
(5.3)
n 0
where    min , min 1 ,..., max
________________________________________________________________________
Biomedical engineering
18
________________________________________________________________________
The pitch is estimated according to the minimum of the waveform and is obtained
by the equation
 max

T pVLAMDF  arg MIN E VLAMDF ( )

  min

(5.4)
The speech recording was divided into 40ms frames. Then, based on certain
conditions to eliminate unvoiced and speech onset or offset frames, appropriate frames
were chosen and pitch was estimated in these frames.
Further, “Mean Vocal Pitch” and “Standard Deviation of Mean Vocal Pitch” were
estimated.
Fig. 5.2 –Waveform of VLAMDF,
1(a) the original waveform, 1(b) the waveform of VLAMDF
________________________________________________________________________
Biomedical engineering
19
________________________________________________________________________
5.4. Classification
Based on histograms of features, extracted from all speech signals for schizophrenic
and control group, which show an approximate Gaussian distribution a classifier based
on Linear Discriminant Analysis [8] was chosen to differentiate between control and
schizophrenia audio samples. Cross-fold validation was used to estimate classification
accuracies.
5.4.1. Linear Discriminant Analysis
Training of the LD classifiers proceeded as follows. Let x be a column vector
containing d feature values which is to be assigned to one of two classes. Assume there
are N1 feature vectors available for training the classifier from class 1 and N2 feature
vectors from class 2. The nth feature vector for training in class k is designated as x n( k ) .
Training involves determining the class-conditional mean vectors 1 and  2 using
1 
1 N1 (1)
 xn
N1 n1
2 
1
N2
N2
x
n 1
( 2)
n
(5.5)
(5.6)
and the common covariance matrix ∑ using

2 Nk
1
( xn( k )   k )( xn( k )   k )T

N1  N 2  2 k 1 n1
(5.7)
The common covariance matrix is employed since within class covariance is similar
and offers better representation for the limited normal training set. After determining
the  k ’s and
Σ
from the training data, a feature vector x is classified by assuming values
________________________________________________________________________
Biomedical engineering
20
________________________________________________________________________
for the prior probability of class 1,  1 (note that  2  1  1 ), and calculating the
discriminant value, y using
 1 
1

y  ( 1   2 )T  1 x  ( 1   2 )T  1 ( 1   2 )  log 
2
(
1


)
1


(5.8)
The posterior probabilities for classes 1 and 2 are then calculated using
exp( y )
exp( y )  1
(5.9)
P(2 | x)  1  P(1 | x)
(5.10)
P(1 | x) 
The final classification is obtained by choosing the class with the highest posterior
probability estimate from (5.10).
5.4.2. Assessing classifier performance
The cross-validation scheme [9] was used for estimating the classifier performance.
The variance of the performance estimates was decreased by averaging results from
multiple runs of cross validation where a different random split of the training data into
folds is used for each run. In this study ten repetitions of fifteen-fold cross-validation
were used to estimate classifier performance. For each run of cross fold validation the
number of normal and abnormal cases were equal.
Classifier performance was measured using sensitivity, specificity, positive
predictivity, negative predictivity and the overall accuracy. These measures were
calculated as per the definition of true positives (TP), true negatives (TN), false
positives (FP) and false negatives (FN) presented in Table 5.1.
________________________________________________________________________
Biomedical engineering
21
________________________________________________________________________
Sensitivity  TP
(TP  FN )
(5.11)
TN  FP
(5.12)
Positive Predictivity  TP
(5.13)
Specificit y  TN
(TP  FP)
Negative Predictivity  TN
(5.14)
(TN  FN )
Overall Accuracy (TP  TN )
(5.15)
(TP  TN  FP  FN )
TABLE 5.1
DEFINITIONS OF TRUE POSITIVES, TRUE NEGATIVES,
FALSE POSITIVES AND FALSE NEGATIVES.
True Classification
Pathology
PREDICTED
CLASSIFICATION
Pathology
Normal
Normal
True Positive False Negative
TP
FN
False Positive True Negative
FP
TN
5.4.3. Receiver Operator Characteristics (ROC)
Receiver operator characteristic curves (ROC curves) provide valuable information
on a test’s ability to discriminate between two classes over the complete spectrum of
decision thresholds. The ROC curve is a graph of sensitivity vs. (100% - specificity) as
the a-priori probabilities of the two classes are swept between zero and one. It provides
information on clinical usefulness since it presents a trade-off in costs between false
________________________________________________________________________
Biomedical engineering
22
________________________________________________________________________
positives and false negatives and can be used to decide the threshold for different
clinical requirements e.g. screening vs. pre-surgical diagnosis. The area enclosed by the
ROC plot is a metric against which other similar tests can be compared [10]. The area
was computed using the trapezoidal rule.
Fig. 5.3 –Interpretation of ROC curve
This Chapter presented some of the voice features that can be automatically
extracted from audio voice files, along with a pattern classification scheme. The next
chapter will use these features to discriminate between patients and controls.
________________________________________________________________________
Biomedical engineering
23
________________________________________________________________________
6. RESULTS
The values for all features, all patients and controls were input to the subject
independent LDA classifier to assess their performance to distinguish between
schizophrenic and non-schizophrenic subjects.
In the ROC graphs below, the Sensitivity, the Specificity, the Positive Predictivity,
the Negative Predictivity, the Overall Accuracy and the Area Under the ROC Curve are
shown for all features individually and for the combination of all features.
The reader is reminded that classifier performance is not dependent on Overall
Accuracy but on a combination of all these classifier metrics.
________________________________________________________________________
Biomedical engineering
24
________________________________________________________________________
6.1. Number of Pauses
Classifier performance:
1) Sensitivity = 66.67%
2) Specificity = 46.67%
3) Positive Predictivity = 55.56%
4) Negative Predictivity = 58.33%
5) Overall Accuracy = 56.67%
6) Area Under the ROC Curve = 0.591
Fig. 6.1 – ROC curve for feature “Number of Pauses”
________________________________________________________________________
Biomedical engineering
25
________________________________________________________________________
6.2. Mean Pause Duration
Classifier performance:
1) Sensitivity = 60.00%
2) Specificity = 53.33%
3) Positive Predictivity = 56.25%
4) Negative Predictivity = 57.14%
5) Overall Accuracy = 56.67%
6) Area Under the ROC Curve = 0.596
Fig. 6.2 – ROC curve for feature “Mean Pause Duration”
________________________________________________________________________
Biomedical engineering
26
________________________________________________________________________
6.3. Mean Pause Duration per Second
Classifier performance:
1) Sensitivity = 73.33%
2) Specificity = 60.00%
3) Positive Predictivity = 64.71%
4) Negative Predictivity = 69.23%
5) Overall Accuracy = 66.67%
6) Area Under the ROC Curve = 0.622
Fig. 6.3 – ROC curve for feature “Mean Pause Duration per Second”
________________________________________________________________________
Biomedical engineering
27
________________________________________________________________________
6.4. Mean Utterance Duration
Classifier performance:
1) Sensitivity = 40.00%
2) Specificity = 80.00%
3) Positive Predictivity = 66.67%
4) Negative Predictivity = 57.14%
5) Overall Accuracy = 60.00%
6) Area Under the ROC Curve = 0.631
Fig. 6.4 – ROC curve for feature “Mean Utterance Duration”
________________________________________________________________________
Biomedical engineering
28
________________________________________________________________________
6.5. Total Recording Time
Classifier performance:
1) Sensitivity = 60.00%
2) Specificity = 86.67%
3) Positive Predictivity = 81.82%
4) Negative Predictivity = 68.42%
5) Overall Accuracy = 73.33%
6) Area Under the ROC Curve = 0.764
Fig. 6.5 – ROC curve for feature “Total Recording Time”
________________________________________________________________________
Biomedical engineering
29
________________________________________________________________________
6.6. Total Length of Pauses
Classifier performance:
1) Sensitivity = 66.67%
2) Specificity = 53.33%
3) Positive Predictivity = 58.82%
4) Negative Predictivity = 61.54%
5) Overall Accuracy = 60.00%
6) Area Under the ROC Curve = 0.533
Fig. 6.6 – ROC curve for feature “Total Length of Pauses”
________________________________________________________________________
Biomedical engineering
30
________________________________________________________________________
6.7. Total Length of Utterances
Classifier performance:
1) Sensitivity = 80.00%
2) Specificity = 80.00%
3) Positive Predictivity = 80.00%
4) Negative Predictivity = 80.00%
5) Overall Accuracy = 80.00%
6) Area Under the ROC Curve = 0.836
Fig. 6.7 – ROC curve for feature “Total Length of Utterances”
________________________________________________________________________
Biomedical engineering
31
________________________________________________________________________
6.8. Mean Energy per Second
Classifier performance:
1) Sensitivity = 53.33%
2) Specificity = 100.00%
3) Positive Predictivity =100.00%
4) Negative Predictivity = 68.18%
5) Overall Accuracy = 76.67%
6) Area Under the ROC Curve = 0.871
Fig. 6.8 – ROC curve for feature “Mean Energy per Second”
________________________________________________________________________
Biomedical engineering
32
________________________________________________________________________
6.9. Standard Deviation of Mean Energy per Second
Classifier performance:
1) Sensitivity = 53.33%
2) Specificity = 100.00%
3) Positive Predictivity = 100.00%
4) Negative Predictivity = 68.18%
5) Overall Accuracy = 76.67%
6) Area Under the ROC Curve = 0.876
Fig. 6.9 – ROC curve for feature “Standard Deviation of
Mean Energy per Second”
________________________________________________________________________
Biomedical engineering
33
________________________________________________________________________
6.10. Mean Vocal Pitch
Classifier performance:
1) Sensitivity = 40.00%
2) Specificity = 46.67%
3) Positive Predictivity = 42.86%
4) Negative Predictivity = 43.75%
5) Overall Accuracy = 43.33%
6) Area Under the ROC Curve = 0.453
Fig. 6.10 – ROC curve for feature “Mean Vocal Pitch”
________________________________________________________________________
Biomedical engineering
34
________________________________________________________________________
6.11. Standard Deviation of Mean Vocal Pitch
Classifier performance:
1) Sensitivity = 66.67%
2) Specificity = 53.33%
3) Positive Predictivity = 58.82%
4) Negative Predictivity = 61.54%
5) Overall Accuracy = 60.00%
6) Area Under the ROC Curve = 0.622
Fig. 6.11 – ROC curve for feature “Standard Deviation of
Mean Vocal Pitch”
________________________________________________________________________
Biomedical engineering
35
________________________________________________________________________
6.12. Performance of classifier for all features combined
Classifier performance:
1) Sensitivity = 86.67%
2) Specificity = 93.33%
3) Positive Predictivity = 92.86%
4) Negative Predictivity = 87.50%
5) Overall Accuracy = 90.00%
6) Area Under the ROC Curve = 0.920
Fig. 6.12 – ROC curve for all features combined
________________________________________________________________________
Biomedical engineering
36
________________________________________________________________________
From the time statistics features, the best performance was demonstrated by the
feature “Total Length of Utterances” with the Overall Accuracy of 80% and the Area
Under the ROC Curve equal 0.836. This was followed by “Total Recording Time” with
the Overall Accuracy of 73.3% and the Area Under the ROC Curve equal 0.764. For all
other features, the Overall Accuracy was proximately 60% and the Area Under the ROC
Curve ranged from 0.53 to 0.63.
This Chapter presented the results of the classifier performance individually for all
extracted features and for all features combined. Next Chapter will discuss further
possibilities of improvement of here presented system.
________________________________________________________________________
Biomedical engineering
37
________________________________________________________________________
7. DISCUSSION
This work presented a system for discrimination between schizophrenic and nonschizophrenic subjects based on audio analysis of speech signals.
Based on the results, it can be seen that schizophrenic patients speak slowly and
produce longer words in comparison to healthy subjects This explains why the features
“Total Length of Utterances” and “Total Recording Time” provided one of the highest
performance in discrimination between schizophrenic and non-schizophrenic subjects.
Features “Mean Energy per Second” which represents the average value of energy
in each second of the speech signal and “Standard Deviation of Mean Energy per
Second”, representing the standard deviation of previous feature for whole speech signal
and dynamics of the speech, both performed well with the Overall Accuracy of 76.7%
and the Area Under the ROC Curve equal 0.871 and 0.876. The increase of energy in
patients’ voices could be a side-effect of medication, as it has been reported that patients
under sedation tend to speak louder and to produce utterances with a greater variation of
loudness. [1]
The feature “Mean Vocal Pitch”, representing the average value of vocal pitch
measured in each second of the recording, did not perform satisfactorily with the Overall
Accuracy of 43.3% and the Area Under the ROC Curve equal 0.453. “Standard
Deviation of Mean Vocal Pitch”, which reflects subjects’ intonation, showed better
performance than the previous feature with the Overall Accuracy of 60% and the area
under the curve equal 0.622.
Classifier performance for all features combined was represented by the Overall
Accuracy of 90% and the Area Under the ROC Curve equal 0.920, which allowed very
good discrimination between schizophrenic patients and the control group.
________________________________________________________________________
Biomedical engineering
38
________________________________________________________________________
7.1. New Features
New features should be explored, which, when combined with the actual measured
time statistics, energy and pitch features may increase system performance.
In the study of Stassen et al. [1], the “F0-amplitude” and “F0-contour” features had
the highest contribution on overall performance of their discriminant function and
demonstrated that the patients’ voice sound characteristics were primarily distinctive in
comparison with healthy subjects.
7.2. A larger database
A larger database of schizophrenic patients’ and healthy controls’ speech signals,
containing various types of schizophrenia and greater number of samples, would
contribute to a more robust classification system.
7.3. Full automation and remote diagnostic
Remote diagnostic and full automated system study of vocal fold pathologies
assessment, as presented in Moran et al. [11], could easily be developed. In this way,
high quality assessments could be carried out remotely, resulting in the use of available
medical resources in an efficient and effective manner, as well as providing psychiatrists
with up to date results with minimal processing.
7.4. Real-time implementation
Performance of the algorithm to extract features was assessed. Time statistics for the
calculation of “Number of Pauses”, “Mean Pause Duration”, “Mean Pause Duration per
Second”, “Mean utterance Duration”, “Total Recording Time”, “Total Length of
________________________________________________________________________
Biomedical engineering
39
________________________________________________________________________
Pauses” and “Total Length of Utterances” was assessed. The analysis of a audio file with
sampling frequency of 11 kHz and of 3 minutes in duration was found to take 30
minutes to process on computer with 2GHz AMD Turion64 CPU and 1GB of RAM.
This may not be satisfactory for clinical use.
This temporal performance metric was also carried out on resampled, downsampled
and decimated version of the original file. Based on results downsampling the original
file showed the best results with negligible difference in the feature values compared to
the original. Following downsampling to 3675 Hz, the feature extraction time reduced to
approximately 8 minutes. Further downsampling to 2756 Hz reduced the processing time
to 6 minutes. Further downsampling to 2205 Hz resulted in processing time of 4.5
minutes, again with negligible differences. With use of more a powerful computer the
time needed for analysis maybe shortened and the analysis possibly real-time.
The processes of resampling and decimating yielded poor quality data with respect
to pattern classification and resulted in poor discrimination perfromance.
See Appendix C for examples of this processing.
7.5. Longitudinal study
Regular (monthly) voice recording could be established to investigate if the features
extracted in this study are stable over time. Later, when the effects of medication on
schizophrenic patients’ voice could be investigated, regular voice recording could serve
as a control of prescribed medication.
________________________________________________________________________
Biomedical engineering
40
________________________________________________________________________
8. CONCLUSION
In this study, the possibility to use audio analysis as a basis for objective diagnosis
in psychiatry was presented by building a system capable of discriminating between
schizophrenic patients and healthy control subjects. Very good discrimination with
accuracy of 90% was achieved. An increase of number of specific extracted features may
yield even better results. Further optimalization of feature extraction and implementation
of a remote and fully automated system capable of real-time analysis may reduce the cost
currently associated with assessment of schizophrenia and also help psychiatrists with
objective measures on the efficiency of the medication they prescribe to their patients.
________________________________________________________________________
Biomedical engineering
41
________________________________________________________________________
BIBLIOGRAPHY
[1]
Stassen H. H., Alber M., Püschel J., Scharfetter Ch., Tewesmeier M. & Woggon B.
(1995). Speaking behavior and voice sound characteristics associated with negative
schizophrenia. Journal of Psychiatric Research 1995 Jul-Aug;29(4):277-96
[2]
P.R.Lott, S. Guggenbuhl, A.Schneeberger, A.E.Pulvar H.H. Stassen, “Linguistic
Analysis of the Speech Output of Schizophrenic, Bipolar and Depressive Patients”,
Psychopathology, No. 35, pp 220-227, 2002
[3]
Covington M. A., He C., Brown C., Naci L., McClain J. T., Fjordbak B. S., Semple
J., Brown J. (2005). Schizophrenia and the structure of language: linguist’s view.
Schizophrenia Research 2005 Sep 1, 77(1):85-98
[4]
Wisniecki A., Cannizzaro, M., Cohen, H., & Snyder, P. (2005). Speech Impairments
in
Neurodegenerative
Diseases/Psychiatric
Illnesses.
In
K. Brown
(Ed.), Encyclopedia of Language and Linguistics, 2nd edition. Oxford : Elsevier.
pp. 358-364
[5]
Stassen H. H., Kuny S., Hell D. (1998). The speech analysis approach to
determining
onset
of
improvement
under
antidepressants.
European
Neuropsychopharmacology 1998 Dec; 8(4):303-10
[6]
Deller J. R., Proakis J. G., Hansen J. H. L., Discrete-Time Processing of Speech
Signals. Prentice-Hall, Upper Saddle River, New Jersey 07458, 1993
[7]
X. D. Mei, J. Pan, S. H. Sun (2001). Efficient algorithms for speech pitch
estimation. Proceedings of 2001 International Symposium on Intelligent
Multimedia, Video and Speech Processing. Hong Kong, pp. 421-424.
[8]
Duda R. O., Hart P. E., Stork H. G., Pattern Classification. New York: WileyInterscience, 2000
________________________________________________________________________
Biomedical engineering
42
________________________________________________________________________
[9]
R. Kohavi, “A study of cross validation and bootstrap for accuracy estimation and
model selection,” Proc. 14th Int. Conf on Art. Intel., pp. 1137-1143, 1995.
[10] M.H. Zweig, G. Campbell, “Receiver Operating Characteristic (ROC) Plots: A
Fundamental Evaluation Tool in Clinical Medicine”, J. Clinical Chemistry, Vol. 39,
No. 4, 1993.
[11] Moran R. J., Reilly R. B., de Chazal P., Lacy P. D. (2006). Telephony-Based Voice
Pathology Assessment Using Automated Speech Analysis. IEEE Transactions on
Biomedical Engineering, Vol. 53, No. 3, March 2006
________________________________________________________________________