Download Localization of Brain Activity Using Permutation Analysis Hooman Alikhanian

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Localization of Brain Activity Using
Permutation Analysis
by
Hooman Alikhanian
A thesis submitted to the
Department of Mathematics and Statistics
in conformity with the requirements for
the degree of Master of Science
Queen’s University
Kingston, Ontario, Canada
June 2014
c Hooman Alikhanian, 2014
Copyright Abstract
In this report we study bootstrap theory and permutation analysis as a hypothesis
testing method using bootstrap procedure. We investigate asymptotic properties of
the bootstrap procedure as well as bootstrap estimate accuracy using Edgeworth
and Cornish-Fisher expansions. We show that resampling with replacement from
data provides a theoretically sound method that outperforms Normal approximation
of data distribution in terms of convergence error and accuracy of estimates. We
conclude the report by applying permutation analysis on Magentoencephalography
(MEG) brain signals to localize human brain activity in pointing/reaching tasks and
find regions that are significantly active.
i
Acknowledgements
I would like to thank my supervisor Gunnar Blohm for his support throughout the
years of my research assistantship in Computational Neuroscience laboratory, keeping
me going when times were tough, insightful discussions, and offering invaluable advice.
Contents
Abstract
i
Contents
iii
List of Figures
v
Chapter 1:
1
Introduction
Chapter 2:
Bootstrap Theory
2.1 Bootstrap Confidence Interval . . . . . . . . . . . . . . . . . . . . . .
2.2 Iterated Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
6
11
Chapter 3:
Hypothesis Testing and Permutation Analysis
3.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 The Neyman-Pearson Lemma . . . . . . . . . . . . .
3.2 P-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Permutation Analysis . . . . . . . . . . . . . . . . . . . . . .
13
13
16
21
22
Chapter 4:
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Asymptotic Properties of the Mean
Chapter 5:
Bootstrap Accuracy and Edgeworth
5.1 Edgeworth Expansion . . . . . . . . . . . . . . .
5.2 Bootstrap Edgeworth Expansion . . . . . . . . .
5.3 Bootstrap Confidence Interval Accuracy . . . .
Chapter 6:
Results
6.1 Methods . . . . . . . . . . . . .
6.1.1 Experimental Paradigm
6.1.2 Data Processing . . . . .
6.2 Permutation Analysis Results .
Chapter 7:
.
.
.
.
.
.
.
.
Conclusion
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
Expansion
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
31
31
37
39
.
.
.
.
43
45
45
46
48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
iii
Bibliography
59
iv
List of Figures
6.1
The MEG experiment setup. (a) Time course of the experiment. (b)
Three postures of the hand were used in different recording blocks.
(c) The fixation cross in the middle with two possible target locations
in its left and right hand side. (d) Subjects sit upright under MEG
machine performing the pointing task with the wrist only. (e) Task:
target (cue) appears in either green or red to inform the subject of
the pro or anti nature of the pointing trials. Dimming of the central
fixation cross was the movement instruction for subjects. . . . . . . .
6.2
47
The diagram of the event-related beamformer [8]: The data consists of
T trials each with M channels and N time samples. The covariance
matrix of the data are given to the beamformer as well as the forward
solution for dipole at each location. Average source activity is then
estimated at each voxel, and dipole orientation is adjusted accordingly
to maximize power at the corresponding voxel. . . . . . . . . . . . . .
6.3
48
Average brain activation for pro condition/left target around movement onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal,
and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4
50
Average brain activation for pro condition/left target around cue onset
(0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. 51
v
6.5
Average brain activation for anti condition/right target around movement onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal,
and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6
52
Average brain activation for anti condition/right target around cue
onset (0-0.5 seconds) in three planes (a) transverse (b) sagittal, and
(c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7
53
Permutation analysis for pro condition/left target around movement
onset in three planes with 95% p-values. Right panel: positive activity
(synchronization), Left panel: negative activity (desynchronization) .
6.8
54
Permutation analysis for pro condition/left target around cue onset in
three planes with 95% p-values. Null hypothesis is not rejected for
positive activation. Negative 95% significant activation is shown. . . .
6.9
55
Permutation analysis for anti condition/right target around movement
onset in three planes with 95% p-values. Right panel: positive activity
(synchronization), Left panel: negative activity (desynchronization) .
55
6.10 Permutation analysis for anti condition/right target around cue onset
in three planes with 95% p-values. Null hypothesis is not rejected for
positive activation. Negative 95% significant activation is shown. . . .
vi
56
1
Chapter 1
Introduction
Bootstrap and permutation tests are resampling methods. The main idea of resampling, as the name suggests, is to estimate properties of a population (such as
variance, distribution, confidence intervals) by resampling from the original data. Often in practice access to the whole population is not possible or uneconomical, and
instead a sample data from the population is available. Bootstrap procedure provides
researchers with a tool to infer population properties by resampling from the data.
In this manuscript we study mathematical framework of the bootstrap to evaluate
the validity and accuracy of such an inference procedure.
Bootstrap is not a new idea. When we do not have information about the density
of the population under study and we wish to infer or estimate some properties of the
population from data, we consider the same functional of the sample (or empirical)
distribution. Instead of taking new samples from the population, we perform resampling with replacement from the data.
The idea of using Monte Carlo resampling to estimate bootstrap statistics was
proposed by Efron (1979). The approximation improves as the number of resamples
2
increases. Often the number of resamples is in the order of thousands making resampling methods computationally intensive.
Modern computers and software make it possible to use these computationally
intensive methods to estimate statistical properties in cases where classical methods
may be analytically intractable or unusable because of the lack of appropriate assumptions being satisfied.
Resampling methods are advantageous to classical inference methods such as
Bayesian inference in practice in a sense that they do not require any assumption
on the population distribution. Resampling methods work in practice for statistics
for which no analytical solution is available for their distribution. Moreover, they
provide concrete analogies to theoretical concepts [18].
This report is organized as follows: In chapter 2, we study bootstrap procedure
mathematical formulation. In chapter 3 we study hypothesis testing and NeymanPearson lemma, and permutation analysis as a method of solving hypothesis testing
problems using bootstrap procedure. In chapter 4 we study asymptotic properties
of bootstrap mean estimate. Chapter 5 investigates accuracy of bootstrap estimate
and confidence intervals. Finally, we conclude the report by applying permutation
analysis on a brain magnetic signal database to localize brain activity in a reaching
task.
3
Chapter 2
Bootstrap Theory
We have a sample data set of size n that is drawn randomly from a population,
that is, we have a set χ = {X1 , X2 , ..., Xn } of independent identically distributed
random variables drawn from an unknown population distribution function F . We
are interested in some functional θ(F ) of the population, e.g., population mean. For
instance, for the population mean θ is given as:
θ(F ) =
Z
x dF (x).
(2.1)
We do not know F to solve for θ. We estimate θ with θ̂, by estimating the
distribution function F . One unbiased estimator that can be used for this purpose is
the empirical distribution function, Fb, computed from the sample χ:
n
1X
I(Xi ≤ x),
Fb(x) =
n i=1
(2.2)
where I(.) is the indicator function.
The problem is to study the statistical properties of the estimator θ̂ e.g., variance,
4
confidence interval. To this end, we need the distribution of θ̂−θ. Bootstrap procedure
gives us a tool to estimate the distribution function via resampling from the data χ.
Bootstrap procedure generally involves three steps [18]:
Step 1 Perform resampling with replacement on the data. In resampling, data points
are given the same chance to be chosen. Resampled data size is the same as
that of the original sample.
We want to count the number of resamples that can be drawn from the sample
set χ with replacement. There is a one-to-one correspondence between the
number of resamples and the number of ways of placing n indistinguishable
objects into n numbered boxes. The number of objects that end up in the ith
box is essentially the number of times that we choose data Xi . It follows that
the number of resamples, N(n) is given by:
N(n) =
Using Stirling’s formula n! ∼
n n
e
√
2n − 1
.
n
(2.3)
2πn, we have:
N(n) ∼ (nπ)−1/2 22n−1 .
(2.4)
Thus, the number of resamples increases exponentially with n.
Step 2 For each resample set χ∗ = {X1∗ , X2∗ , ..., Xn∗ } calculate θ. The distribution of
these statistics is referred to as bootstrap distribution [18].
For example, if θ = θ(F ) = µ is the population mean and Fb is the empirical
5
distribution function which assigns weight
1
n
to each data point Xi , then:
n
1X
θ̂ =
Xi .
n i=1
(2.5)
Thus, θ̂ is the sample mean and the distribution of µ is estimated by calculating
the sample mean of each resample χ∗ .
Step 3 Use the bootstrap distribution to construct confidence intervals for θ.
Classical inference theory tells us a great deal about the sample mean. For a Normal
population, sample mean is approximately Normally distributed for any sample size.
For large sample sizes the sample mean is Normally distributed for a broad range of
population distributions as long as the central limit theorem holds. Moreover, the
sample standard deviation for the sample mean is:
v
u
u
ŝ = t
n
1 X
(Xi − X̄)2 ,
n − 1 i=1
(2.6)
where X̄ is the sample mean.
However, for many statistics other than the sample mean, e.g., quantiles, calculating their standard deviation is analytically intractable let alone their distribution.
Of course, one way to get around this problem is to assume a Normal distribution
for the desired statistic and move forward. This method, however, may not work
for situations where the distribution is heavily skewed to one side or has heavy tails.
We will see that bootstrap gives a more accurate estimate of the distribution than
Normal approximation.
In order to estimate the distribution of the statistic, e.g., sample mean, one might
2.1. BOOTSTRAP CONFIDENCE INTERVAL
6
think that instead of resampling from the data, one can draw other sample sets from
the population, estimate the statistic from each sample set, and come up with an
estimate of the statistic distribution, e.g., sample mean distribution. In practice,
such an approach is difficult to implement, because often resampling from the population involves spending a lot of resources, e.g., financial resources that might not
be available. In some cases the population may not easily be accessible. The idea
of the bootstrap is that instead of referring to the population to draw other sample
sets, resampling is performed from the data at hand. To the extent that the data is
representative of the population distribution which is a valid assumption if sampling
methodology is right and the sample size is big enough, resampling from it is justified. Even when resources are available, it might be a better idea to take a larger
sample from the population and perform resampling on it instead of referring to the
population multiple times to draw smaller sample sets [18].
In the case of estimating the standard deviation of θ, SD(θ(X)), bootstrap principle can be summarized as follows:
1. Resample with replacement from χ = {X1 , X2 , ..., Xn } to get the bootstrap
sample χ∗ = {X1∗ , X2∗ , ..., Xn∗ }, and calculate bootstrap estimate θ∗ (X).
∗
2. Get B independent bootstrap replicates θ1∗ (X), θ2∗ (X),...,θB
(X).
∗
3. Estimate SD(θ(X)) by the empirical standard deviation of θ1∗ (X), θ2∗ (X),...,θB
(X).
2.1
Bootstrap Confidence Interval
As was mentioned above, the idea of bootstrap is to give an estimate of the distribution of θ̂ − θ. To construct confidence intervals in order to evaluate the accuracy of
2.1. BOOTSTRAP CONFIDENCE INTERVAL
7
the estimator θ̂, we need the distribution. In this section we borrow terminology and
ideas from [15, 16].
To find the distribution of θ̂ − θ, we need F0 , the population distribution, and
c0 . Since we do not know F0 , bootstrap procedure
the empirical estimate of it, F1 = F
suggests that we use F1 instead of F0 , i.e., take our sample as a representative of the
population, and take bootstrap distribution, F2 that is derived from resampling with
replacement from the sample as an estimate of F1 [15, 16].
Constructing a two-sided α-confidence interval consists of finding a t that solves:
E{ft (F0 , F1 )|F0 } = 0,
(2.7)
where ft is a functional from a class {ft : t ∈ T } for some set T , and is defined as:
ft (F0 , F1 ) = I{θ(F1 ) − t ≤ θ(F0 ) ≤ θ(F1 ) + t} − α,
(2.8)
where I(.) is the indicator function.
According to the bootstrap principal instead of finding t in equation (2.7), we find
t̂ that solves:
E{ft (F1 , F2 )|F1 } = 0.
(2.9)
Many statistical problems can be formulated as equation (2.7) with different functional classes. Equation (2.8) gives one example of such classes that are used to construct confidence intervals.
Finding an estimate of t, t̂, that solves the approximate equation (2.9) instead of
the original one, equation (2.7), is the essence of the bootstrap idea.
A number of methods have been proposed in the literature to construct confidence
2.1. BOOTSTRAP CONFIDENCE INTERVAL
8
intervals [15]. Equation (2.8) is one of such methods that we refer to as Bootstrap
Interval. Bootstrap Percentile-t Interval is another method in which ft is defined as:
ft (F0 , F1 ) = I{θ(F1 ) − tτ (F1 ) ≤ θ(F0 ) ≤ θ(F1 ) + tτ (F1 )} − α.
(2.10)
Bootstrap Percentile-t Interval involves the introduction of a scaling factor τ (F1 )
to equation (2.8). The difference between the two confidence interval methods lies in
the idea of pivotalness. A function of both the data and an unknown parameter is
said to be pivotal if it has the same distribution for all values of the unknowns [16].
For example, for a population with a Normal distribution N(µ, σ 2 ), (X̄, σ̂ 2 ) is the
maximum likelihood estimator of (µ, σ 2 ). Sample mean distribution is also Normal
√
N(µ, σ 2 /n). Thus, Z = n(X̄ − µ) is N(0, σ 2 ). We can immediately see that Z is
non-pivotal, because the distribution depends on the unknown σ. The α-confidence
interval of the sample mean X̄ can be constructed as:
X̄ − n−1/2 xα σ̂, X̄ + n−1/2 xα σ̂ ,
(2.11)
where xα is defined as:
P (|N| ≤ xα ) = α,
(2.12)
for Standard Normal random variable N.
√
Since the distribution of T = n(X̄ − µ)/σ̂ is not Normal, but Student’s t distribution with n − 1 degrees of freedom, the coverage error of the interval in equation
(2.11) stems from approximating Student’s t distribution by a Normal distribution
which is of order O(n−1 ). The distribution of T does not depend on any unknowns,
therefore, T is pivotal. An accurate α-confidence interval for the sample mean is
2.1. BOOTSTRAP CONFIDENCE INTERVAL
9
achieved by substituting tα instead of xα in equation (2.11) such that,
P (|V | ≤ tα ) = α,
(2.13)
for Student’s t random variable V . Scaling factor τ in the above example is σ̂ which
is the maximum likelihood estimator of the standard deviation.
The α-Confidence interval of a statistic θ(F0 ) is called accurate when t is an exact
solution of equation (2.7) with functional ft (F0 , F1 ) from equation (2.8), that is:
P (θ(F1 ) − t ≤ θ(F0 ) ≤ θ(F1 ) + t| F1 ) = α.
(2.14)
If t̂ is an approximate solution of equation (2.7) as in bootstrap confidence interval,
the probability of the event that θ(F0 ) is in the confidence interval will not be exactly
α. The difference:
P θ(F1 ) − t̂ ≤ θ(F0 ) ≤ θ(F1 ) + t̂ F1 ) − α,
(2.15)
is referred to as the coverage error of the interval.
The bootstrap percentile-t interval can be estimated for any functional θ(F0 ).
According to the bootstrap procedure, we construct bootstrap distribution by resampling with replacement from the sample. The bootstrap estimate of θ, θ(F2 ) and the
scaling factor τ (F2 ), e.g., the standard deviation σ̂ ∗ , are estimated from bootstrap
distribution F2 . The α-confidence interval is then calculated as:
(θ(F2 ) − tα τ (F2 ), θ(F2 ) + tα τ (F2 )) ,
(2.16)
2.1. BOOTSTRAP CONFIDENCE INTERVAL
10
where tα is defined as:
P (|T | ≤ tα ) = α,
(2.17)
for random variable T with Student’s t distribution with n − 1 degrees of freedom for
a sample of size n.
In equation (2.16) it is justified to use tα from the t-table as long as the distribution of θ(F1 ) is approximately Normal. If the distribution has heavy tails or is highly
skewed, the confidence interval in equation (2.16) will not be a good approximation
of the true confidence interval.
In general, the distribution of θ(F1 ) is not known. One special case is when θ
is the sample mean. In this case if the sample size is large enough, the distribution
of the sample mean can be approximated as Normal according to the Central Limit
Theorem. What can we do for other statistics?
Bootstrap distribution of the statistic can be constructed. If the bootstrap distribution is approximately Normal and not heavily skewed, the confidence interval
of equation (2.16) can be used. To see this, we can estimate t∗α from the bootstrap
distribution of θ(F1 ), that is to find t∗α such that:
P (θ(F2 ) − t∗α τ (F2 ) ≤ θ(F1 ) ≤ θ(F2 ) + t∗α τ (F2 )|F1 ) = α,
(2.18)
which can be solved as:
t∗α = inf {P (θ(F2 ) − t∗α τ (F2 ) ≤ θ(F1 ) ≤ θ(F2 ) + t∗α τ (F2 )|F1 ) ≥ α}.
t
(2.19)
To solve equation (2.18) using Monte Carlo approximation, we choose integers B ≥ 1
and 1 ≤ ν ≤ B such that ν/(B + 1) = α for a rational number α. For instance if
2.2. ITERATED BOOTSTRAP
11
α = 0.95, we can take (ν, B) = (95, 99). According to bootstrap procedure, we draw B
independent resamples from χ with replacement, namely {χ∗1 , χ∗2 , ..., χ∗B }, and for each
resample we calculate the corresponding empirical distribution F2,b , b = 1, 2, ..., B.
Define:
Tb∗ = |θ(F2,b ) − θ(F1 ))|/τ (F2,b ).
(2.20)
We pick the νth largest value of Tb∗ as the Monte Carlo estimate of t∗α . As B → ∞,
Tb∗ → t∗α with probability one.
Now that we have estimated t∗α using the bootstrap distribution, we can construct
the bootstrap-t confidence interval as:
(θ(F2 ) − t∗α τ (F2 ), θ(F2 ) + t∗α τ (F2 )) .
(2.21)
If this confidence interval matches closely with the interval from equation (2.16) in
which we got t from t-table, distribution of θ is approximately Normal. Otherwise,
equation (2.21) provides a better approximation of the confidence interval in the sense
of a smaller coverage error.
2.2
Iterated Bootstrap
To develop the bootstrap idea, we started from finding t that solves equation (2.7). A
lack of knowledge of population distribution F0 instructed us to substitute F1 instead
of it in equation (2.7), and substitute the bootstrap distribution F2 instead of F1 to
solve for t̂1 in equation (2.9) as an approximation of t. We argued that we can use the
empirical distribution F1 instead of the population distribution as long as the sample
can be considered representative of the population.
2.2. ITERATED BOOTSTRAP
12
This idea can be developed one step further by resampling with replacement from
each resample data χ∗ , and solve for t̂2 as an approximation of t in:
E{ft (F2 , F3 )|F2 } = 0,
(2.22)
where F2 is the bootstrap distribution from resampling of data χ, and F3 is the
bootstrap distribution from resampling of resampled data χ∗ .
In theory we can continue this process ad infinitum. In fact, it can be shown
that each iteration improves coverage error by the order of O(n−1 ). However, we
showed that the number of resamples in the first iteration grows exponentially with the
sample size n. This different growth rate makes iterated resampling computationally
intractable in practice. In most practical problems, resampling is performed only in
one level with 1000 to 5000 resamples.
13
Chapter 3
Hypothesis Testing and Permutation Analysis
3.1
Hypothesis Testing
Hypothesis testing is a statistical decision making procedure to test whether or not
a hypothesis that has been formulated about a population statistic is correct. The
decision leads to either accepting or rejecting the hypothesis in question. The hypothesis is based on some property of a statistic from a population. For instance, to
test the hypothesis that the mean of a population is equal to µ0 , we can formulate a
hypothesis testing problem with the null hypothesis defined as H : µ = µ0 , and the
alternative hypothesis as K : µ 6= µ0 .
A hypothesis testing procedure uses inferential statistics to learn more about a
population that is too large or inaccessible. Often instead of the population we have
access to a sample that is drawn randomly from it, and we need to estimate the statistic from the sample at hand. For instance, to solve a hypothesis testing about the
mean of a population, we can use the sample mean which is the unbiased estimator
of the mean.
To state the problem, let us assume that we want to form a decision about a
3.1. HYPOTHESIS TESTING
14
random variable X with distribution Pθ that belongs to a class P = {Pθ , θ ∈ Ω}.
We want to formulate some hypothesis about θ. The set Ω can be partitioned into
classes for which the hypothesis is true and those for which it is false. The resulted
two mutually exclusive classes are ΩH and ΩK respectively such that ΩH ∪ ΩK = Ω.
Four possible outcomes can occur as a result of testing a hypothesis: (1) The null
hypothesis is true and we accept it, (2) the null hypothesis is true and we reject it,
(3) the null hypothesis is false and we accept it, and (4) the null hypothesis is false
and we reject it. In cases (2) and (3) we are making an error in our decision making,
that is, we form a perception about a property of the population that is in fact not
true. Thus, two types of error can happen in the decision making process: Type I
error occurs when (2) is the case, and type II error occurs when (3) is the case. Let
us denote the probabilities of type I error and type II error by α and β respectively.
Ideally, hypothesis testing should be performed in a manner that keeps the probabilities of the two types of error α and β to a minimum. However, when the number
of observations is given, both probabilities cannot be controlled simultaneously. In
hypothesis testing, researchers collect evidence to reject the null hypothesis. In the
process, they assume that the null hypothesis is true unless they can prove otherwise.
Thus, it is customary to control the probability of committing type I error [21].
The goal in hypothesis testing is to partition the sample space S into two mutually
exclusive sets S0 and S1 . If X falls in S0 , the null hypothesis is accepted, and if it
falls in S1 we reject the null hypothesis. S0 and S1 are referred to as acceptance and
critical regions respectively.
In order to control the probability of type I error, we put a significance level α
which is a number between 0 and 1 on the probability of S1 under the assumption
3.1. HYPOTHESIS TESTING
15
that the null hypothesis is true. That is:
Pθ (X ∈ S1 ) ≤ α for all θ ∈ ΩH .
(3.1)
We are in effect limiting the probability of type I error to α which can be chosen
as an arbitrary small number such as 0.05. We then find S1 such that Pθ (X ∈
S1 ) is maximized for all θ ∈ ΩK under equation (3.1) condition. We maximize the
probability of rejecting the null hypothesis when it is in fact false. This probability
is referred to as the power of the critical region.
So far we have considered a case where we allow every outcome x of random
variable X to be either a member of S0 or S1 . We can generalize this idea and
assume that x can belong to the rejection region with probability φ(x), and to the
acceptance region by probability 1 − φ(x). Then the hypothesis testing experiment
will involve drawing from random variable X with two possible outcomes R and R̄
with probabilities φ(x) and 1−φ(x) respectively. If R is the outcome of the experiment
we reject the hypothesis, and otherwise accept it.
If the distribution of X is Pθ , then the probability of rejection will be:
Eθ φ(X) =
Z
φ(x)dPθ (x).
(3.2)
The problem is to find φ(x) that maximizes test power βθ which is defined as:
Eθ φ(X) =
Z
φ(x)dPθ (x) for all θ ∈ ΩK ,
(3.3)
3.1. HYPOTHESIS TESTING
16
under the condition:
Eθ φ(X) ≤ α for all θ ∈ ΩH .
3.1.1
(3.4)
The Neyman-Pearson Lemma
The Neyman-Pearson Lemma provides us with a way of finding the best critical
region [21].
Theorem 3.1.1. Let P0 and P1 be probability distributions with densities p0 and p1
respectively with respect to a measure µ.
(1) Existence: For testing H : p0 against the alternative K : p1 , there exists a test φ
and a constant k such that:
E0 φ(X) = α,
and
φ(x) =


 1

 0
when p1 (x) > kp0 (x)
(3.5)
(3.6)
when p1 (x) < kp0 (x)
(2) Sufficient condition for the most powerful test: if a test satisfies equation (3.5)
and equation (3.6) for some k, then it is most powerful for testing p0 against p1 at
level α.
(3) Necessary condition for the most powerful test: If φ is the most powerful for
testing p0 against p1 at level α, then for some k it satisfies (3.6) all most everywhere
µ. It also satisfies (3.5) unless there exists a test of size < α and with power 1.
Proof. If we define 0 × ∞ := 0 and allow k to become ∞, the theorem is true for
α = 0 and α = 1. So, let us assume that 0 < α < 1.
(1) Let α(c) = P0 {p1 (X) > cp0 (X)}. Because the probability is computed under P0 ,
we just need to consider the inequality for the set where p0 (x) > 0. Therefore, α(c) will
3.1. HYPOTHESIS TESTING
17
be the probability that the random variable p1 (X)/p0 (X) is greater than c. 1 − α(c)
will then be a cumulative distribution function. Thus, α(c) is nonincreasing, and
continuous on the right, that is: α(c − 0) − α(c) = P0 {p1 (x)/p0 (x) = c}, α(−∞) = 1,
and α(∞) = 0. for given 0 < α < 1 let c0 be such that α(c0 ) < α < α(c0 − 0), and
consider the test φ defined by:
φ(x) =



1



when p1 (x) > c0 p0 (x)
α−α(c0 )
when p1 (x) = c0 p0 (x)
α(c0 −0)−α(c0 )




 0
when p1 (x) < c0 p0 (x).
The middle expression is defined unless α(c0 ) = α(c0 − 0). Under that condition
P0 {p1 (X) = c0 p0 (X)} = 0, and φ is defined almost everywhere.
The size of φ is:
E0 φ(X) = P0
p1 (X)
> c0
p0 (X)
α − α(c0 )
+
P0
α(c0 − 0) − α(c0 )
p1 (X)
= c0
p0 (X)
= α.
(3.7)
Comparing the size in equation (3.7) with equation (3.5) we see that c0 is the k of
the theorem. (2) To prove sufficiency, let us take φ∗ as any other test that satisfies
condition E0 φ∗ (X) ≤ α. Denote S + and S − as sample space subsets for which
φ(x) − φ∗ (x) > 0 and φ(x) − φ∗ (x) < 0 respectively. For all x in S + and S − , we have
p1 (x) ≥ kp0 (x) and p1 (x) ≤ kp0 (x) respectively. Thus we have:
Z
∗
(φ − φ )(p1 − kp0 )dµ =
Z
S + ∪S −
(φ − φ∗ )(p1 − kp0 )dµ ≥ 0.
(3.8)
3.1. HYPOTHESIS TESTING
18
The difference in power is then:
Z
∗
(φ − φ )p1 dµ ≥ k
Z
(φ − φ∗ )p0 dµ ≥ 0.
(3.9)
Therefore, φ is more powerful than φ∗ .
(3) To prove the necessary condition, let us assume that φ∗ is the most powerful
to test p1 against p0 at level α, and it is not equal φ. Take S as the intersection of
S + ∪ S − with {x : p1 (x) 6= p0 (x)} and suppose that µ(S) > 0. Since (φ − φ∗ )(p1 − kp0 )
is positive on S, we have:
Z
S + ∪S −
∗
(φ − φ )(p1 − kp0 )dµ =
Z
S
(φ − φ∗ )(p1 − kp0 )dµ > 0.
(3.10)
Therefore φ is more powerful against p1 than φ∗ which is a contradiction unless
µ(S) = 0 which completes the proof [21, 20].
The proof shows that equations (3.5) and (3.6) give necessary and sufficient conditions for a most powerful test up to sets of measure zero, that is whenever the set
{x : p1 (x) = kp0 (x)} is µ-measure zero. Note that the theorem applies for discrete
distributions as well.
To summarize the idea behind the Neyman-Pearson lemma, let us suppose that
X1 , X2 , ..., Xn is an independent identically distributed (i.i.d) random sample with
joint density function f (X; θ). In testing null hypothesis H : θ = θ0 against the
alternative K : θ = θ1 the critical region:
CK = {x : f (x, θ0 )/f (x, θ1 ) < K},
(3.11)
3.1. HYPOTHESIS TESTING
19
is most powerful for K > 0 according to the Neyman-Pearson lemma.
As an example, let us suppose that X represents a single observation from probability density function f (x, θ) = θxθ−1 for 0 < x < 1. To test null hypothesis
H : θ0 = 1 against K : θ1 = 2 with significance level α = 0.95. We have:
f (x, θ0 )
1
=
.
f (x, θ1 )
2x
Thus, the rejection region is R = {x : x > k ′ }, where k ′ = 1/2k, and k > 0.
To determine the value of k ′ , we calculate the size of the test with respect to k ′
and solve for k ′ to get the desired test size 0.05:
P {x ∈ R|H} = P {x > k ′ |H} = 1 − k ′ = 0.05.
Thus, k ′ = 0.95, and the rejection region is R = {x : x > 0.95}. From the lemma,
among all tests of null hypothesis H : θ = 1 against K : θ = 2 with level 0.05,
rejection region R has the smallest type II error probability.
Our treatment of the problem so far involves simple distributions where the distribution class contains a single distribution. This enables us to solve hypothesis
testing problem for the null hypothesis of the form H : θ = θ0 and the alternative
K : θ = θ1 . In practical applications, however, we might be interested in solving a
hypothesis testing problem of the form H : θ ≤ θ0 and K : θ > θ0 which involves a
composite distribution class rather than a simple one.
If there exists a real-valued function T (x) such that for any θ < θ′ , the distributions Pθ and Pθ′ are distinct, and the ratio pθ′ (x)/pθ (x) is a nondecreasing function
of T (x) then pθ (x) is said to have monotone likelihood ratio property [21].
3.1. HYPOTHESIS TESTING
20
Theorem 3.1.2. Let the random variable X have probability density pθ (x) with monotone likelihood ratio property in a real-valued function T (x), and θ a real parameter.
(1) For testing H : θ ≤ θ0 against K : θ > θ0 the most powerful test is given by:



1



φ(x) =
γ




 0
when T (x) > C
when T (x) = C
(3.12)
when T (x) < C,
where C and γ are determined by:
Eθ0 φ(X) = α.
(3.13)
β(θ) = Eθ φ(X)
(3.14)
(2) The power function of the test:
is strictly increasing for all θ for which 0 < β(θ) < 1.
(3) The test from equations (3.12) and (3.13) is the most powerful test for testing
H ′ : θ ≤ θ′ against K ′ : θ > θ′ at level α′ = β(θ′ ) for all θ′ .
(4) The test minimizes β(θ), the probability of type I error, among all tests that satisfy
(3.13) for θ < θ0 .
The one-parameter exponential family is an important class of distributions with
monotone likelihood property with respect to real-valued function T (x) that satisfy
the assumptions of the theorem from the following corollary [21]:
3.2. P-VALUES
21
Corollary 3.1.3. Let X have probability density function with respect to some measure µ, and θ be a real number
pθ (x) = C(θ)eQ(θ)T (x) h(x),
(3.15)
where Q(θ) is strictly monotone. Then φ(x) from equation (3.12) is the most powerful
test for testing H : θ ≤ θ0 against K : θ > θ0 for increasing Q with level α, where
C and γ are determined from equation (3.13). For decreasing θ the inequalities in
equation (3.12) are reversed.
3.2
P-Values
So far we have studied hypothesis testing for a fixed significance level α. In alternative
standard non-Bayesian approach α is not fixed. For varying α, we want to determine
the smallest significance level at which the null hypothesis would be rejected for a
given observation. This significance level is referred to as the p-value of the test.
For random variable X let us suppose that the distribution of p1 (X)/p0 (X) is
continuous. Then the most powerful test can specify rejection region Sα as {x :
p1 (x)/p0 (x) > k} for k = k(α) as a function of α, where k is determined from the
size equation (3.5). Performing the test for varying α creates nested rejection regions,
that is:
Sα ⊂ Sα′
if α < α′ .
(3.16)
The p-value can now be determined as:
p̂ = p̂(X) = inf{α : X ∈ Sα }.
(3.17)
3.3. PERMUTATION ANALYSIS
22
For example, let us suppose that X is a Normal random variable N(µ, σ 2 ), and σ 2
is known. We want to formulate a hypothesis testing problem on µ with H : µ = 0
as the null hypothesis against K : µ = µ1 as the alternative for some µ1 > 0. The
likelihood ratio can be written as:
p1 (x)
=
p0 (x)
exp
h
−(x−µ1 )2
2σ2
exp
−x2 2σ2
i
µ21
µ1 x
− 2 .
= exp
σ2
2σ
Thus, in order to have p1 (x)/p0 (x) > k, x should be greater than k ′ > 0 which can be
determined from the constraint P0 {X > k ′ } = α. Thus, the rejection region can be
written as Sα = {X : X > σz1−α }, where z1−α is the 1 − α percentile of the standard
Normal distribution. From the definition of percentile, we have:
Sα = {X : 1 − Φ(
X
) < α}.
σ
For a given observed value of X, the infimum of Sα over all α can be written as:
p̂ = 1 − Φ(
X
),
σ
which is uniformly distributed on (0, 1).
3.3
Permutation Analysis
The Neyman-Pearson lemma determines the most powerful test for simple tests as
well as for composite ones with a monotone likelihood property. As we studied in
the previous section, to determine the rejection region one also needs to know the
3.3. PERMUTATION ANALYSIS
23
distribution of the test statistic under the null hypothesis. Often in practical applications finding the test statistic distribution under the null hypothesis cannot be done
analytically. Permutation tests address this problem by providing researchers with a
simple way of estimating test statistic distribution using the bootstrap idea.
A permutation test is essentially hypothesis testing through bootstrapping. The
idea of permutation analysis is to estimate a test statistic distribution by resampling
with replacement under the assumption that the null hypothesis is true. For instance,
suppose that we perform an experiment in which brain magnetic signal is recorded
from a brain region while the subject is performing a reaching task. The signal is
recorded for 2 seconds. During the first 0.5 seconds a baseline signal is recorded while
the subject is sitting still doing nothing, and the last 1.5 seconds is recorded while the
subject is performing the reaching task. We want to investigate whether the brain
region is active during the experiment. To be more specific, suppose that the data is
collected at a rate of 600 samples per second. Thus, we have 300 samples of baseline,
and 900 samples from the task.
One way to approach the problem is to compare the mean of the signal during the task, µtask with that of the baseline, µbaseline . To this end, we formulate a
hypothesis testing problem to test null hypothesis H : µtask − µbaseline = 0 against
K : µtask − µbaseline > 0 as alternative. As a test statistic we take the difference
between sample means x̄task − x̄baseline . The null hypothesis assumes no difference
between the baseline and the task. Thus, resampling with replacement under the null
hypothesis would mean that out of 1200 total samples we pick 300 and 900 samples
to assign to the baseline and the task respectively. Then, we calculate the difference
between sample means for each resample. If we take 2048 resamples, we will have
3.3. PERMUTATION ANALYSIS
24
2048 of such mean differences. We can calculate the empirical distribution of the
differences to build the bootstrap distribution. To calculate the p-value we locate the
sample mean difference in the bootstrap distribution.
Permutation tests scramble data randomly between the groups. Therefore, for the
test to be valid the distribution of the two groups must be the same under the null
hypothesis. To account for the differences in standard deviation, it is more accurate
to consider pivotal test statistic and normalize by the unbiased standard deviation
estimator of the groups.
We can use permutation tests when problem design and the null hypothesis allow
us to resample under the null hypothesis. Permutation tests are suitable for three
groups of problems: Two-sample problems when the null hypothesis assumes that test
statistic is the same for both groups, matched pair designs when the null hypothesis
assumes that there are only random differences within pairs, and for problems that
study a relationship, i.e. correlation coefficient between two variables when the null
hypothesis assumes no relationship between them [18].
In the Result section we study the above example in more depth.
25
Chapter 4
Asymptotic Properties of the Mean
In chapter 2 we reviewed bootstrap theory. In practice, bootstrap procedure is used
when population and/or statistic distribution(s) is not known. In this section we
study the validity of the bootstrap procedure for sample mean which is one of the
statistics that can be handled analytically.
Let X1 , X2 , ..., Xn be n independent identically distributed random variables with
the common distribution F with mean µ, and variance σ 2 both unknown. The sample
P
mean µ̄n = n1 ni=1 Xi is the unbiased estimator of the mean µ. If we take the
Pn
1
2
2
estimator σ̂n2 = n−1
i=1 (Xi − µ̄n ) as an estimator of σ then from the Central Limit
√
Theorem the pivotal statistic Qn = n(µ̄n − µ)/σ̂n tends to N(0, 1) in distribution.
It is interesting to study the asymptotic behaviour of the bootstrap distribution.
We pick n resamples X1∗ , X2∗ , ..., Xn∗ with replacement from the sample set. With
each resample having the same chance of getting picked, we can assign 1/n probability
mass to each of the n resamples. The bootstrap sample mean is given as:
n
µ̄∗n
1X ∗
X ,
=
n i=1 i
(4.1)
26
and the sample variance is given as:
n
σ̂n∗2
1 X ∗
=
(Xi − µ̄∗n )2 .
n − 1 i=1
(4.2)
We are interested in the asymptotic behaviour of the pivotal bootstrap statistic Q∗n =
√
n(µ̄∗n − µ̄n )/σ̂n∗ . As we discussed in chapter 2, we construct the pivotal statistic by
replacing the sample mean by the bootstrap mean, and the population mean µ by the
sample mean µ̄n . Essentially we are taking the bootstrap distribution F ∗ that assigns
the same probability mass to all Xi , i = 1, ..., n instead of the population distribution
F.
Theorem 4.0.1. Let X1 , X2 , ... be an independent identically distributed random sequence with positive variance σ 2 . For almost all sample sequences X1 , X2 , ... conditional on (X1 , X2 , ..., Xn ), as n tends to ∞:
√
(1) Conditional distribution n(µ̄∗n − µ̄n ) converges to N(0, σ 2 ) in distribution.
(2) σ̂n∗ → σ in probability.
Parts (1) and (2) of Theorem (4.0.1) and Slutski theorem imply that the pivotal
bootstrap statistic Q∗n converges to N(0, 1) in distribution. In this report we prove
part (1) of the theorem using the ideas and lemmas from Angus (1989) [2]. For the
complete proof of part (2) using the law of large numbers, we refer to Politis and
Romano (1994) [24].
Lemma 4.0.2 (Borel-Cantelli). Let {An , n ≥ 1} be a sequence of events in a probability space. If
∞
X
n=1
P (An ) < ∞,
(4.3)
27
then P (Ai.o.) = 0, where i.o. stands for infinity often, that is:
∞
A(i.o.) = ∩∞
i=1 ∪n=i An .
(4.4)
Proof. We want to prove that only a finite number of the events can occur. Let
In = I{An } be the indicator function of An . The number of events that can occur
P
can be determined as N = ∞
k=1 Ik . P (Ai.o.) = 0 if and only if P (N < ∞) = 1. By
P∞
Fubini’s theorem E(N) = n=1 P (An ) which is finite by assumption. E(N) < ∞
implies that P (N < ∞) = 1 which completes the proof.
Lemma 4.0.3. Let the sequence X1 , X2 , ... consist of identical independent distribution random variables with E|X1 | < ∞, then for every ǫ > 0 P {|Xn | > ǫn
i.o.} = 0.
Proof. It is sufficient to show that the Borel-Cantelli assumption holds. Fix ǫ > 0:
∞
X
n=1
P (|Xn | ≥ ǫn) =
∞ X
∞
X
P {ǫk ≤ |X1 | ≤ ǫ(k + 1)}
n=1 k=n
∞ X
k
X
F ubini
=
k=1 n=1
=
∞
X
k=1
P {ǫk ≤ |X1 | ≤ ǫ(k + 1)}
kP {ǫk ≤ |X1 | ≤ ǫ(k + 1)}
≤ E|X1 |/ǫ < ∞.
Borel-Cantelli lemma completes the proof.
Lemma 4.0.4. Let the sequence X1 , X2 , ... consist of identical independent distributed
28
random variables with E|X1 |2 < ∞, then:
−3/2
lim sup n
n→∞
n
X
k=1
|Xk |3 → 0 almost surely.
Proof. Fix ǫ > 0:
−3/2
n
n
X
k=1
3
−3/2
|Xk | = n
≤ n−3/2
≤ n−3/2
n
X
k=1
n
X
k=1
n
X
k=1
n
X
√
√
−3/2
|Xk | I{|Xk | ≥ ǫ k} + n
|Xk |3 I{|Xk | < ǫ k}
3
k=1
n
X
−3/2
√
|Xk |3 I{|Xk | ≥ ǫ k} + ǫn
√
|Xk |3 I{|Xk | ≥ ǫ k} + ǫn−1
k=1
n
X
k=1
√
|Xk |2 k
|Xk |2
(4.5)
By Lemma (4.0.3):
n
o
√
3
P |Xk | I{|Xk | ≥ ǫ k} =
6 0 i.o. = P |Xk |2 ≥ ǫ2 k
i.o. = 0.
√
Thus, |Xk |3 I{|Xk | ≥ ǫ k} = 0 almost surely for all but finitely many k values.
√
P
Therefore, n−3/2 nk=1 |Xk |3 I{|Xk | ≥ ǫ k} → 0 almost surely as n → ∞. By
P
the law of large numbers, the second term in equation (4.5), ǫn−1 nk=1 |Xk |2 , conP
verges almost surely to ǫE[X12 ] as n → ∞. Thus, lim supn→∞ n−3/2 nk=1 |Xk |3 →
0 almost surely.
Now we are ready to prove Theorem (4.0.1).
Proof. Define Tn∗ =
√
n(µ̄∗n − µ̄n ). Tn∗ can be written as the sum of n independent
identically distributed random variables n−1/2 (Xk∗ − µ̄n ), k = 1, ..., n. Resampled
29
random variable Xk∗ can take values from the sample space X1 , ..., Xn with equal
probability mass 1/n. Thus, the characteristic function of Tn∗ can be written as:
"
n
1X
E [exp(itTn∗ )] =
exp
n j=1
it(Xj − µ̄n )
√
n
#n
.
(4.6)
By repeated integration by parts exp(ix) can be written as:
exp(ix) = 1 + ix −
where θ(x) :=
3
x3
Rx
0
x2 x3
+ θ(x),
2
6
(4.7)
i3 (x − t)2 eit dt is a continuous function of x, and |θ(x)| ≤ 1. Thus,
equation (4.6) can be written as:
n
= 1+
1 X it(Xj − µ̄n )
√
n j=1
n
n
n
1 X t3 (Xj − µ̄n )3
1 X t2 (Xj − µ̄n )2
+
θ
−
n j=1
2n
n j=1
6n3/2
t(Xj − µ̄n )
√
n
n
.
(4.8)
As n → ∞ by the law of large numbers the second term in brackets goes to zero
almost surely. If we refer to the last term as Qn , we get:
E
From |θ(x)| ≤ 1, n|Qn | ≤
[exp(itTn∗ )]
|t|3 −3/2
n
6
t2 σ̂n2
+ Qn
= 1−
2n
Pn
j=1 |Xj
n
.
(4.9)
− µ̄n |3 . Thus, from Lemma (4.0.4) as
n → ∞, nQn → 0 almost surely. By the law of large numbers σ̂n2 → σ 2 almost surely
as n → ∞. Hence, as n → ∞ we have:
E
[exp(itTn∗ )|X1 , X2 , ..., Xn ]
n
2 2
−t σ
t2 σ 2
,
→ exp
= 1−
2n
2
(4.10)
30
which is the characteristic function of N(0, σ 2 ). This completes the proof of part
(1).
Thus far, we have shown that the bootstrap procedure works asymptotically for
the mean. Delta method guarantees the validity of the procedure for any function
with continuous derivatives in the neighbourhood of the mean as well.
Another problem of interest is to study the order of accuracy of the bootstrap
estimate. In particular, is it more accurate to use a Normal approximation for the
population instead of using the bootstrap estimate of the distribution? This is the
subject of the next section where we show that the answer is no!
31
Chapter 5
Bootstrap Accuracy and Edgeworth Expansion
5.1
Edgeworth Expansion
Moment generating function of random variable X is defined as φ(t) = E(eitX ). As
the name suggests, moments of the random variable can be found from the moment
generating function. The jth moment of the random variable X which is defined as
µj = E(X j ) is the jth derivative of the moment generating function at t = 0:
µj =
dj φ(t)
(0).
dtj
(5.1)
Taylor series expansion of the moment generating function at t = 0 can be written
as:
φ(t) = E(eitX ) = E
∞
X
(itX)n
n=0
=
n!
∞
X
µn (it)n
n=0
n!
,
!
(5.2)
5.1. EDGEWORTH EXPANSION
32
where µ0 = 1 and 0! = 1.
Cumulant generating function of the random variable X is defined as log(φ(t)),
the natural logarithm of moment generating function. Cumulants are found from the
power series expansion of the cumulant generating function.
log(φ(t)) =
∞
X
κn (it)n
n=1
n!
,
(5.3)
where κn is the nth cumulant of the random variable X.
To find the relationship between cumulants and moments of random variable X,
we can write log(φ(t)) as:
∞
∞
X
X
1
1
itX n
log(φ(t)) = −
(1 − E(e )) = −
n
n
n=1
n=1
∞
X
µm (it)m
−
m!
m=1
!n
.
(5.4)
Comparing (5.3) and (5.4) expansions, it can be shown that the cumulants are homogeneous polynomials of moments and vice versa [16]. In particular, we have the
following relationships for the first four cumulants:
κ1 = µ1
(5.5)
κ2 = µ2 − µ21 = var(X)
(5.6)
κ3 = µ3 − 3µ2 µ1 + 2µ31
(5.7)
κ4 = µ4 − 4µ3 µ1 − 3µ22 + 12µ2 µ21 − 6µ41 .
(5.8)
The third and forth cumulants, κ3 and κ4 , are referred to as skewness and kurtosis
respectively.
Let X1 , X2 , ... be independent identically distributed random variables with mean
5.1. EDGEWORTH EXPANSION
33
µ, and variance σ 2 . As we mentioned in the previous section, by the Central Limit
√
Theorem Qn = n(µ̄n − µ) converges to N(0, σ 2 ) in distribution as n → ∞, where
µ̄n is the sample mean for a sample of size n. Let us assume that µ1 = µ = 0 and
√
σ 2 = 1. The problem of interest is to find the cumulative distribution of Sn = nµ̄n =
P
n−1/2 nj=1 Xj . In particular, we are interested in the power series of P (Sn ≤ x). Such
an expansion is referred to as Edgeworth expansion.
To this end, we start from the characteristic function of Sn :
φn (t) = E{exp(itSn )}.
(5.9)
From the independence assumption φn (t) can be written as:
φn (t) = E{exp(itn−1/2 X1 )}E{exp(itn−1/2 X2 )}...E{exp(itn−1/2 Xn )}
= φ(tn−1/2 )...φ(tn−1/2 ) = φn (tn−1/2 ).
(5.10)
From equation (5.3), we have:
"
φn (t) = φn (tn−1/2 ) = exp
∞
X
κj (itn−1/2 )j
j=1
j!
!#n
.
(5.11)
Since κ1 = 0 and κ2 = 1,
κj n−(j−2)/2 (it)j
t2 κ3 n−1/2 (it)3
+ ... +
+ ... .
φn (t) = exp − +
2
6
j!
(5.12)
By expanding the exponent we get:
2 /2
φn (t) = e−t
2 /2
+ n−1/2 r1 (it)e−t
2 /2
+ ... + n−j/2 rj (it)e−t
+ ...,
(5.13)
5.1. EDGEWORTH EXPANSION
34
where rj (it)’s are polynomials of degree 3j and parity j with coefficients κ3 , ..., κj+1 [16].
Moreover, rj (it)’s are independent of n. In particular,
1
r1 (u) = κ3 u3
6
1
1
r2 (u) = κ23 u6 + κ4 u4 .
72
24
By definition, the characteristic function of Sn can be written as,
φn (t) =
Z
∞
−∞
2 /2
Moreover, the fact that e−t
eitx dP (Sn ≤ x).
(5.14)
is the characteristic function of standard Normal sug-
gests that it is possible to take the inverse expansion of φn (t) to get the Cumulative
Distribution Function (CDF) of Sn ,
P (Sn ≤ x) = Φ(x) + n−1/2 R1 (x) + ... + n−j/2 Rj (x) + ....,
(5.15)
where Φ(x) is the CDF of standard Normal, and,
Z
∞
2 /2
eitx dRj (x) = rj (it)e−t
.
(5.16)
−∞
The next step is to calculate Rj (x),
−t2 /2
e
=
Z
∞
eitx dΦ(x).
−∞
(5.17)
5.1. EDGEWORTH EXPANSION
35
Integration by parts on equation (5.17) gives,
−t2 /2
e
= (−it)
−1
= (−it)−j
Z
∞
Z−∞
∞
eitx dΦ(1) (x) = ...
eitx dΦ(j) (x),
−∞
where Φ(j) (x) = D j Φ(x) and D is the differentiation operator. Therefore,
Z
∞
2 /2
eitx d{rj (−D)Φ(x)} = rj (it)e−t
.
(5.18)
−∞
From equations (5.16) and (5.18), and the uniqueness of Fourier transform, we deduce
that,
Rj (x) = rj (−D)Φ(x).
(5.19)
(−D)j Φ(x) = −Hej−1 (x)φ(x),
(5.20)
For j ≥ 1,
where Hen (x) is the standardized Hermite polynomial of degree n with the same
parity as n, and is defined as,
Hen (x) = (−1)n ex
2 /2
dn −x2 /2
e
.
dxn
(5.21)
Therefore, Rj (x) can be written as Rj (x) = Pj (x)φ(x), where Pj (x) is a polynomial
of degree 3j − 1 with opposite parity to j, and coefficients that depend on moments
5.1. EDGEWORTH EXPANSION
36
of X up to order j + 2. In particular,
1
P1 (x) = − κ3 (x2 − 1),
6
1 2 4
1
2
2
κ4 (x − 3) + κ3 (x − 10x + 15) .
P2 (x) = −x
24
72
(5.22)
(5.23)
The Edgeworth expansion of the cumulative distribution function, P (Sn ≤ x) can be
written as,
P (Sn ≤ x) = Φ(x)+n−1/2 P1 (x)φ(x)+n−1 P2 (x)φ(x)+...+n−j/2 Pj (x)φ(x)+.... (5.24)
For the CDF of random variable Y the Edgeworth expansion converges as an
infinite series if E{exp(1/4Y 4 )} < ∞ [10]. This is a restrictive condition on the tails
of the distribution. However, the expansion shows that if the series is stopped after
j terms, the remainder will be of order n−j/2 ,
P (Sn ≤ x) = Φ(x) + n−1/2 P1 (x)φ(x) + n−1 P2 (x)φ(x) + ...
+ n−j/2 Pj (x)φ(x) + o(n−j/2 ),
(5.25)
which is a valid expansion for fixed j as n → ∞. Cramer [10] gives a sufficient
regularity condition for the expansion as,
E(|X|j+2) < ∞
lim sup |φ(x)| < 1.
(5.26)
|t|→∞
Bhattacharya and Ghosh [5] show that for statistics with continuous derivatives
in the neighbourhood of the mean equation (5.25) with regularity conditions (5.26)
5.2. BOOTSTRAP EDGEWORTH EXPANSION
37
converges uniformly in x as n → ∞. As is the case for Sn , polynomials Pj (x) are of
degrees 3j − 1 with opposite parities to j (even polynomial for odd j and vice versa),
and coefficients that depend on moments of X up to order j + 2 and their derivatives
up to order k + 2.
5.2
Bootstrap Edgeworth Expansion
Let X1 , X2 , ..., Xn be independent identically distributed random samples with common distribution F , and θ̂ be the estimator of the statistic θ that is computed from
the dataset χ = {X1 , X2 , ..., Xn } with empirical distribution Fb. Let us further assume
that S = n1/2 (θ̂ − θ) is asymptotically N(0, σ 2 ) where σ 2 = σ 2 (F ) is the variance of
S. Then pivotal statistic T is,
T = n1/2 (θ̂ − θ)/σ̂,
(5.27)
where σ̂ 2 = σ 2 (Fb) is asymptotically N(0, 1).
From [5], the Edgeworth expansions of S and T CFDs are,
P (S ≤ x) = Φ(x/σ) + n−1/2 P1 (x/σ)φ(x/σ) + n−1 P2 (x/σ)φ(x/σ) + ...,
(5.28)
P (T ≤ x) = Φ(x) + n−1/2 Q1 (x)φ(x) + n−1 Q2 (x)φ(x) + ...,
(5.29)
where Pj (x) and Qj (x) are polynomials of degree 3j − 1 of opposite parity to j.
Thus, the Normal approximation of the CDF of S and T is P (S ≤ x) ≃ Φ(x/σ) and
P (T ≤ x) ≃ Φ(x) respectively which are in error by order n−1/2 .
To study bootstrap accuracy, let us assume that the bootstrap estimate θ̂∗ of θ̂ is
5.2. BOOTSTRAP EDGEWORTH EXPANSION
38
computed from the resample dataset χ∗ = {X1∗ , X2∗ , ..., Xn∗ } with bootstrap distribution Fb ∗ . Then bootstrap estimates of S and T are written as,
S ∗ = n1/2 (θ̂∗ − θ̂),
(5.30)
T ∗ = n1/2 (θ̂∗ − θ̂)/σ̂ ∗ ,
(5.31)
where σ̂ ∗ is the bootstrap estimate of σ̂.
Edgeworth expansions of S ∗ and T ∗ CDFs are,
P (S ∗ ≤ x|χ) = Φ(x/σ̂) + n−1/2 Pb1 (x/σ̂)φ(x/σ̂) + n−1 Pb2 (x/σ̂)φ(x/σ̂) + ...,
b1 (x)φ(x) + n−1 Q
b2 (x)φ(x) + ...,
P (T ∗ ≤ x|χ) = Φ(x) + n−1/2 Q
(5.32)
(5.33)
bj (x) are obtained by replacing unknowns in Pj (x) and Qj (x) by
where Pbj (x) and Q
their bootstrap estimates respectively. The estimates of coefficients in Pbj (x) and
bj (x) differ from their corresponding values in Pj (x) and Qj (x) by the order n−1/2 .
Q
Therefore, the accuracy of bootstrap CDFs of S and T is,
P (S ∗ ≤ x) − P (S ≤ x) = Φ(x/σ̂) − Φ(x/σ) + O(n−1),
(5.34)
P (T ∗ ≤ x) − P (T ≤ x) = O(n−1).
(5.35)
In equation (5.34) standard deviation estimate σ̂ differs from the standard deviation
σ by the order n−1/2 , σ̂ − σ = O(n−1/2 ). Thus,
P (S ∗ ≤ x) − P (S ≤ x) = O(n−1/2 ).
(5.36)
5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY
39
Equations (5.35) and (5.36) outline bootstrap CDF accuracy for S and T respectively. Therefore the bootstrap estimate of S has the same order of accuracy as
Normal approximation whereas bootstrap estimate of T is more accurate than the
Normal approximation by the order n−1/2 . This brings us to the advantage of pivotal
statistics. T is a pivotal statistic while S is not. The distribution of T does not depend on any unknown values, and the bootstrap power is directed toward estimating
distribution skewness while in the case of non-pivotal statistic S, bootstrap power is
“wasted” toward estimating standard deviation.
At this stage the problem of interest is to study the accuracy of bootstrap confidence interval.
5.3
Bootstrap Confidence Interval Accuracy
We recall that the Edgeworth expansion of the CDF of statistic Sn can be written as in
equation (5.25). Denote the α-level percentile of Sn and standard Normal distribution
by ξα , and zα respectively,
ξα = inf{x : P (Sn ≤ x) ≥ α}.
(5.37)
By taking the inverse from both sides of equation (5.25), we can write the series
expansion of ξα in terms of zα as,
ξα = zα + n−1/2 P1cf (zα ) + ... + n−j/2 Pjcf (zα ) + O(n−j/2 ).
(5.38)
This expansion is referred to as Cornish-Fisher expansion. Cornish and Fisher [9]
proved that the asymptotic series converges uniformly in ǫ < α < 1−ǫ for 0 < ǫ < 1/2.
5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY
40
In the expansion, Pjcf (x)’s are polynomials of degree at most j +1 with opposite parity
to j with coefficients that depend on cumulants up to order j + 2. In particular,
P1cf (x) = −P1 (x),
1
P2cf (x) = P1 (x)P1′ (x) − xP12 (x) − P2 (x).
2
(5.39)
(5.40)
In this section we consider two types of α-confidence intervals: one-sided and
two-sided confidence intervals that we denote by I1 and I2 respectively,
I1 = (−∞, θ̂ + n−1/2 σ̂zα )
(5.41)
I2 = (θ̂ − n−1/2 σ̂xα , θ̂ + n−1/2 σ̂xα ),
(5.42)
where zα and xα are defined as,
P (N ≤ zα ) = α,
(5.43)
P (|N| ≤ xα ) = α,
(5.44)
for a standard Normal random variable N. Essentially, I1 and I2 are constructed
under the assumption of a Normal distribution for the population. From the Edgeworth expansion of statistic T in equation (5.29) we evaluate the accuracy of such an
assumption, that is we calculate the coverage error of I1 ,
P (θ ∈ I1 ) = P (T > −zα )
= 1 − {Φ(−zα ) + n−1/2 Q1 (−zα )φ(−zα ) + O(n−1 )}
= α − n−1/2 Q1 (−zα )φ(−zα ) + O(n−1 ).
5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY
41
Therefore, coverage error of the one-sided interval is in order n−1/2 . Similarly, by
noting that Q1 (x) and Q2 (x) are even and odd polynomials respectively, we calculate
the coverage error of I2 ,
P (θ ∈ I2 ) = P (T > −xα ) − P (T < −xα )
= α + 2n−1 Q2 (xα )φ(xα ) + O(n−2),
which indicates that the coverage error of the two-sided interval is in order n−1 .
We study the accuracy of bootstrap percentile estimates for pivotal statistic T .
Let us define the α-percentile of T as ηα ,
P (T ≤ ηα ) = α.
(5.45)
Similarly, for the bootstrap estimate of T which is denoted by T ∗ in equation (5.31)
α-percentile η̂α is defined as,
P (T ∗ ≤ η̂α |χ) = α.
(5.46)
From Cornish-Fisher expansion ηα is written as,
−1 cf
−1
ηα = zα + n−1/2 Qcf
1 (zα ) + n Q2 (zα ) + O(n ),
(5.47)
5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY
42
cf
where zα is the α-percentile of the standard Normal distribution, and Qcf
1 and Q2
are defined as,
Qcf
1 (x) = −Q1 (x),
1 2
′
Qcf
2 (x) = Q1 (x)Q1 (x) − xQ1 (x) − Q2 (x),
2
(5.48)
(5.49)
where Q1 (x) and Q2 (x) are the polynomials in Edgeworth expansion of CDF of T in
cf
equation (5.29). By substituting coefficients in Qcf
1 (x) and Q2 (x) with their respec-
tive bootstrap estimates we obtain Cornish-Fisher expansion of η̂α ,
bcf (zα ) + n−1 Q
bcf (zα ) + O(n−1 ).
η̂α = zα + n−1/2 Q
1
2
(5.50)
bcf (x) differ from their corresponding values in Qcf (x)
The estimates of coefficients in Q
j
j
by order n−1/2 . Therefore,
η̂α = ηα + O(n−1 ),
(5.51)
that is the order of accuracy of the bootstrap quantile estimate is n−1 which outperforms the order n−1/2 of Normal approximation in equation (5.47).
So far we have studied bootstrap theory and investigated permutation analysis
as a hypothesis testing method based on bootstrap principle. Moreover, we studied
asymptotic behaviour of bootstrap as well as the accuracy of bootstrap estimates, and
showed that bootstrap resampling from data outperforms Normal approximation of
data distribution both in accuracy and asymptotic convergence. In the next chapter
we follow the example that we pointed out briefly in section (3.3) to use permutation
analysis to localize statistically significant brain activity in reaching/pointing tasks.
43
Chapter 6
Results
Recent developments in brain imaging such as electroencephalography (EEG), magnetoencephalography (MEG), positron emission tomography (PET), and functional
magnetic resonance imaging (fMRI) have made it possible for researchers to localize
brain activity more accurately. Ideal localization of activity is spatially accurate and
preserves the timing of the activity. However, in choosing an appropriate neuroimaging technique, there has always been a trade off between time and spatial resolution.
For instance, fMRI works by detecting variations in levels of blood oxygenation that
occur when blood flow increases in active brain areas due to more oxygen demand.
Because of the time lag between neural activity and detectable blood oxygenation
level, fMRI image time resolution is low (on the order of seconds). Therefore, while
fMRI spatial accuracy is in millimetre scale which is high enough resolution in most
studies, time resolution of the activity is compromised making it unsuitable for studies in which timing plays a crucial role. On the other hand, EEG captures timing of
the activity more accurately by measuring the natural electrical current flow of active
neurons. However, spatial localization of activity cannot be performed accurately
because of the artifacts of non-cerebral origin such as eye movements and cardiac
44
artifacts that interfere with EEG data [22].
The neural imaging method of choice in this study is MEG. The MEG machine
uses more than hundred highly sensitive super conductor magnetic sensors, referred
to as superconducting quantum interference device (SQUID), on the scalp to measure magnetic fields that are produced radially by neuronal electric currents. MEG
time resolution is on the order of milliseconds making the method appropriate for
real time brain functional studies. Since the magnetic permeability of the scalp and
underneath tissues is approximately the same, a neuronal magnetic signal can be
measured without much distortion which is an advantage over EEG where largely
varying conductances of these tissues distort measurements. Reliable estimates of
such conductances must be available in EEG experiments to make up for the distortion [17].
Localization via recorded magnetic signals in MEG poses an inverse problem due
to the fact that the number of sources in the brain from which activity is recorded is
more than the number of measuring sensors. A number of post processing techniques
such as spatial filtering (beamforming) have been proposed in the literature to solve
the problem [23, 27, 3, 25, 26, 7, 8]. Moreover, classical and adaptive clustering algorithms are proposed to improve beamformer spatial resolution [12, 1]
In this report we take the following localization approach: After appropriate preprocessing steps to prepare the data, we discretize the brain into 3mm3 voxels, and
for each voxel compute the source activity in 7-35Hz frequency band using an eventrelated beamformer [8]. For each participant in the MEG experiment we construct a
3D brain image by registering voxel locations from his/her corresponding MEG system
6.1. METHODS
45
coordinate to the standardized Talairach coordinates [17] using an affine transformation (SPM2) (more detail in [1]). An average activation pattern can be calculated
across all the resulted images (one image per subject). However, if the resulted brain
activation patterns are largely varied across images, average activation pattern will
not be informative enough to find active brain areas consistently across subjects.
Therefore, we propose permutation analysis to find significantly active brain areas in
the resulted images.
This chapter is organized as follows: in section I we describe the experiment setup
and methodology as well as post processing methods to make the data ready for permutation analysis. Finally, section II presents permutation analysis methodology and
results.
6.1
Methods
Participants
Ten healthy adult participants (8 males, 2 females) age range 22-45 years with no
history of neurological dysfunction or injury participated in this study. This study
was approved both by the York University and Hospital for Sick Children Ethics
Board. All participants gave informed consent.
6.1.1
Experimental Paradigm
Figure 6.1 shows the experiment setup. Participants sat upright with their head
under the dewar of the MEG machine in a electromagnetically shielded room during
the experiment (Figure 6.1(d)). In each trial subjects performed a memory guided
reaching task while they remained fixated on a central white cross. After a fixation
6.1. METHODS
46
for 500ms a green or red dot (target) was briefly presented for 200ms randomly either
to the right or the left of the centre cross (Figure 6.1(a,c)). We refer to the 500ms
interval before the target onset as baseline period. The centre cross dimmed after
1500ms as an instructor for subjects to start pointing toward (pro) or to the mirror
opposite location of the target (anti) while the eyes remain fixated. Direction of
pointing depended on the colour of the target, e.g. green and red represented pro
and anti trials respectively. Pointing movement was wrist-only with three different
wrist/forearm postures for right hand (pronation, upright, and down) and one posture
for left hand (pronation) in separate blocks of trials (Figure 6.1(b)). Each pointing
trial lasted approximately 3 seconds with a 500ms inter-trial interval (ITI). 100 trials
for each condition-left hand versus right hand, pro versus anti, and 3 hand posturesamount to 1200 trials for each subject. Movement onset for each subject is measured
using bipolar differential electromyography (EMG). For more detail of the experiment
and MEG data acquisition procedure please refer to [1].
6.1.2
Data Processing
Data were collected at a rate of 600 samples per second with a 150Hz low pass filter,
using synthetic third-order gradiometer noise cancellation. After manually inspection
for artifacts in addition to eye movements, blinks, premature hand movements and
removing corresponding trials from the analysis, on average 98 reaching trials per
condition were retained for each subject for subsequent processing.
Brain source activity is estimated from sensor data using event-related beamforming [8]. The idea of beamforming is depicted in Figure 6.2. The brain is discretized
into voxels of volume 3mm3 for each subject. Beamformer assumes a dipole direction
6.1. METHODS
47
Figure 6.1: The MEG experiment setup. (a) Time course of the experiment. (b)
Three postures of the hand were used in different recording blocks. (c)
The fixation cross in the middle with two possible target locations in its
left and right hand side. (d) Subjects sit upright under MEG machine
performing the pointing task with the wrist only. (e) Task: target (cue)
appears in either green or red to inform the subject of the pro or anti
nature of the pointing trials. Dimming of the central fixation cross was
the movement instruction for subjects.
at each voxel location. Using the dipole forward solution and sensor covariance matrix it then solved for a dipole direction that minimizes power variance at the voxel.
Dipole direction at each voxel can be regarded as a spatial filter weight that when
applied to sensor data reconstructs instantaneous power at the corresponding voxel
location by rejecting interfering power of adjacent voxels.
6.2. PERMUTATION ANALYSIS RESULTS
48
(A) Calculation of source activity over time (“virtual sensors”)
Virtual sensor
for source j
Single trial data
M channels
T trials
MXM
covariance
matrix
trial 1
compute normalized
beamformer weights
Wn for source j
trial 2
trial 3
forward solution
for dipole j(r,u)
trial T
pseudo-Z
Time (s)
Average
N samples
(B) Imaging instantaneous source amplitude (“event-related” beamformer)
Thresholded source image
superimposed on MRI
Compute average virtual sensors for 3-dimensional grid
covering entire brain volume
Map absolute
amplitude at
latency t
P(t) = Wnm(t)
6.0
pseudo-Z
2.0
n voxels
Figure 6.2: The diagram of the event-related beamformer [8]: The data consists of T
trials each with M channels and N time samples. The covariance matrix
of the data are given to the beamformer as well as the forward solution
for dipole at each location. Average source activity is then estimated at
each voxel, and dipole orientation is adjusted accordingly to maximize
power at the corresponding voxel.
6.2
Permutation Analysis Results
Brain activity is studied in separate frequency bands. In neuroscience terminology
7-15Hz, 15-35Hz, 35-55Hz, and 55-120Hz are referred to as alpha, beta, lower-gamma,
and higher-gamma bands respectively [17]. In this study, the frequency band of focus
is 7-35Hz, that is alpha and beta bands put together. Thus, beamformer estimated
power time series at each voxel location is bandpass filtered to retain the power in
7-35Hz band.
Data at each voxel is aligned at the cue onset when target appears, and around
6.2. PERMUTATION ANALYSIS RESULTS
49
the movement onset when subject starts the movement according to the EMG measurement. Moreover data samples are transformed into Z-scores by subtracting each
sample from the baseline mean and normalize by standard deviation.
In [1] a two-level adaptive cluster analysis is proposed to find active brain areas
in the experiment. It is shown that a large network of brain areas are involved in
reaching task starting from visual areas in occipital lobe and continuing to parietal
areas that are presumably responsible for sensorimotor transformations, and finally
movement planning and execution in primary motor cortex.
Figure 6.3 shows average brain z-score power activity from 0.45 seconds before
movement onset to movement onset averaged across all subjects for right hand movement toward left targets (pro-left condition). Activity is shown in three planes: (a)
transverse, (b) sagittal, and (c) coronal. As the figure shows, a network of brain areas are active with positive activations (neuronal synchronization) in occipital visual
areas (e.g. V3)and parietal areas, and negative activity (neuronal desynchronization)
in contralateral primary motor cortex (M1) that executes the movement.
Figure 6.4 shows average brain z-score power around cue onset from target appearance to 0.5 seconds after, averaged across all subjects for movement toward left
targets (pro-left). Figure shows contralateral desynchronization in primary visual areas followed by activations in parietal areas such as mid-posterior intra-parietal sulcus
(mIPS), angular gyrus (AG), inferior parietal lobule (IPL), and superior parietal lobule (SPL).
Figure 6.5 shows average brain z-score power activity from 0.45 seconds before
movement onset to movement onset averaged across all subjects for right hand movement to mirror opposite direction of right targets (anti-right condition). This is
6.2. PERMUTATION ANALYSIS RESULTS
Pro-left mov-righthand
M1
(b)
50
V3
(c)
Psuedo-Z
3.37
1
-1
-2.66
(a)
Figure 6.3: Average brain activation for pro condition/left target around movement
onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c)
coronal.
essentially the same movement as pro-left condition except that the target appears
on the right. The reason to include such anti conditions in the experiment was to
detach movement from target stimuli in order to be able to investigate parietal brain
areas that are responsible for sensorimotor transformation from retinal to shoulder
coordinates to execute an accurate movement plan [4]. As it can be seen from the
figure activation pattern around movement onset is the same as that of pro-left condition in Figure 6.3. Pre-motor ventral (PMV) area is also shown in the figure.
Figure 6.6 shows average brain z-score power around cue onset from target appearance to 0.5 seconds after, averaged across all subjects for anti movement/right
target (anti-right). Figure shows desynchronization in contralateral in mIPS and AG.
So far we have looked at average activation patterns which denote by X̄ for the
6.2. PERMUTATION ANALYSIS RESULTS
51
Pro-left cue-righthand
SPL
mIPS
IPL
AG
(b)
(c)
Psuedo-Z
0.91
0.80
-0.80
-2.19
(a)
Figure 6.4: Average brain activation for pro condition/left target around cue onset
(0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal.
following argument. As it is evident from the figures, average activation patterns
appear in wide pseudo-z ranges. Thus, it is important to investigate what areas
are significantly active among all subjects, that is to set pseudo-z score thresholds
above which average patterns are statistically significant. To this end, we formulate
a hypothesis testing problem with null hypothesis as,
H : X̄ = 0
(6.1)
K : X̄ 6= 0
where H and K are the null and alternative hypotheses respectively, and X̄ is the
average activation pattern.
We limit the probability of type I error by α = 0.05 to calculate 95% confidence
6.2. PERMUTATION ANALYSIS RESULTS
52
Anti-right mov-righthand
(b)
(c)
PMV
(a)
Psuedo-Z
3.05
1
-1
-2.90
Figure 6.5: Average brain activation for anti condition/right target around movement
onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c)
coronal.
intervals and p-values, and solve the hypothesis testing problem at each brain voxel
individually. We do not know the distribution of data and sample mean to solve the
hypothesis problem analytically. Therefore, we estimate the mean distribution using
bootstrap procedure and use permutation to solve the problem.
Bootstrapping procedure involves resampling with replacement from the data, and
calculate the mean for each resample to estimate cumulative distribution function
of the mean. As mentioned in section 3.3 in permutation analysis resampling is
performed consistent with the null hypothesis. If we assume that the null hypothesis
H is true then the mean of power signal at each voxel is zero which is equal to the
mean of the inverted version of it. Therefore, for each condition, say pro-left around
movement onset, for every subject at each brain voxel we have 3 signals corresponding
6.2. PERMUTATION ANALYSIS RESULTS
53
Anti-right cue-righthand
mIPS
AG
(b)
(c)
Psuedo-Z
1.18
(a)
1
-1
-2.23
Figure 6.6: Average brain activation for anti condition/right target around cue onset
(0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal.
to the 3 wrist-postures and their inverted versions which amount to a resampling
dataset of size 60 for all subjects at each brain voxel. Notice that the original sample
mean consists of 30 signals without considering their inverted counterparts. Moreover,
by including all the postures in the resampling dataset as well as the original mean we
are ignoring posture effects in the signals. Empirical cumulative distribution of the
mean is estimated by taking 2048 resamples with replacement from the resampling
dataset. If the sample mean is greater than 95%-bootstrap percentile then we reject
the null hypothesis.
Figure 6.7 shows the permutation analysis result for pro-left condition around
movement onset. The left panel and right panel show negative and positive activity
respectively. Negative and positive threshold for 95% p-values are shown in the figure
6.2. PERMUTATION ANALYSIS RESULTS
Maximum value=2.66
Absolute threshold=2.23
54
Maximum value=3.37
Absolute threshold=2.26
Figure 6.7: Permutation analysis for pro condition/left target around movement onset in three planes with 95% p-values. Right panel: positive activity
(synchronization), Left panel: negative activity (desynchronization)
as well. As is evident from the figure activity in M1 and V3 are statistically significant
(Figure 6.3).
Figure 6.8 shows the permutation analysis result for pro-left condition around
cue onset. The null hypothesis is not rejected for positive activation, and negative
significant activity with the corresponding threshold is shown in the figure. As the
figure shows only mean negative activation in Figure 6.4 is significant.
Figure 6.9 shows the permutation analysis result for anti-right condition around
movement onset. The left panel and right panel show negative and positive activity
respectively. Negative and positive threshold for 95% p-values are shown in the figure
as well.
Figure 6.10 shows the permutation analysis result for anti-right condition around
cue onset. The null hypothesis is not rejected for positive activation, and negative
significant activity with the corresponding threshold is shown in the figure. As the
figure shows only negative activation in mIPS is significant.
6.2. PERMUTATION ANALYSIS RESULTS
55
Maximum value=2.19
Absolute threshold=1.36
Figure 6.8: Permutation analysis for pro condition/left target around cue onset in
three planes with 95% p-values. Null hypothesis is not rejected for positive
activation. Negative 95% significant activation is shown.
Maximum value=2.90
Absolute threshold=2.09
Maximum value=3.05
Absolute threshold=2.13
Figure 6.9: Permutation analysis for anti condition/right target around movement
onset in three planes with 95% p-values. Right panel: positive activity
(synchronization), Left panel: negative activity (desynchronization)
6.2. PERMUTATION ANALYSIS RESULTS
56
Maximum value=2.23
Absolute threshold=1.62
Figure 6.10: Permutation analysis for anti condition/right target around cue onset
in three planes with 95% p-values. Null hypothesis is not rejected for
positive activation. Negative 95% significant activation is shown.
57
Chapter 7
Conclusion
In this report we have studied the theory of bootstrap. Bootstrap procedure is widely
used recently due to computational advances that have made the implementation possible.
We studied bootstrap procedure mathematical theory, and construction of bootstrap confidence interval. We covered hypothesis testing as well as Neyman-Pearson
lemma as an important theorem in hypothesis testing which provides a necessary and
sufficient condition for the uniform most powerful test for a wide range of hypothesis
testing problems. Permutation analysis is also investigated as a method of using resampling idea of the bootstrap procedure in solving hypothesis testing problems.
We investigated the asymptotic properties of sample mean bootstrap distribution
estimate, and showed that it is asymptotically Normal.
Next, we studied bootstrap estimate accuracy order using the tools from Edgeworth and Cornish-Fisher expansions. The idea of resampling with replacement from
data to derive an estimate for statistic under study might look like a stretch at first
sight. Interestingly, the method not only converges to the true value of the statistic
58
asymptotically but also, provides a more accurate estimate than the Normal approximation. Furthermore, confidence interval estimate accuracy is improved considerably.
Finally, we applied permutation analysis on a database of brain magnetic signals
that is collected in an MEG reaching experiment to locate brain areas involved in
reaching. We showed that permutation analysis provides a statistically sound framework to derive a significance threshold in brain images specially when there is large
variability in the data that makes average activity more difficult to read. Further
investigation into the role of the areas is called for. One idea is to look at timefrequency responses at each voxel in which frequency axis covers 7-120Hz range and
time axis covers the duration of the experiment. Time-frequency response in a brain
region helps us study activation time series in different frequencies around cue onset
and movement onset. Furthermore, comparing such patterns between right and left
regions helps us to find specific time points that activity flips from positive to negative and vice versa between the two corresponding regions which might direct more
specifically to the role of such regions during the experiment.
BIBLIOGRAPHY
59
Bibliography
[1] H. Alikhanian, J. D. Crawford, J. F. Desouza, D. Cheyne, and G. Blohm.
Adaptive cluster analysis approach for functional localization using magnetoencephalography. Front Neurosci., 7:73, 2013.
[2] J. E. Angus. A note on the central limit theorem for the bootstrap mean. Communications in Statistics-Theory and Methods, 18(5):1979–1982, 1989.
[3] G. Barbati, C. Porcaro, F. Zappasodi, P. M. Rossini, and F. Tecchio. Optimization of an independent component analysis approach for artifact identification and removal in magnetoencephalographic signals. Clinical Neurophysiology,
115(5):1220–1232, 2004.
[4] S. M. Beurze, I. Toni, L. Pisella, and W. P. Medendorp. Reference frames
for reach planning in human parietofrontal cortex. Journal of Neurophysiology,
104:1736–1745, 2010.
[5] R. N. Bhattacharya and J. K. Ghosh. On the validity of the formal edgeworth
expansion. Ann. Statist., 6(2):239–472, 1978.
[6] P. J. Bickel and D. A. Freedman. Some asymptotic theory for the bootstrap.
Ann. Statist., 9(6):1196–1217, 1981.
BIBLIOGRAPHY
60
[7] D. Cheyne, L. Bakhtazad, and W. Gaetz. Spatiotemporal mapping of cortical
activity accompanying voluntary movements using an event-related beamforming
approach. Hum. Brain Mapp., 48:213–229, 2006.
[8] D. Cheyne, A. C. Bostan, W. Gaetz, and E. W. Pang. Event-related beamforming: A robust method for presurgical functional mapping using meg. Clinical
Neurophysiology, 118(8):1691–1704, 2007.
[9] E. A. Cornish and R. A. Fisher. Moments and cumulants in the specification of
distributions. Revue de lInstitut Internat. de Statistique, 5:307–322, 1938.
[10] H. Cramer. On the composition of elementary errors. Skand. Aktuarietidskr,
(1):141–180, 1928.
[11] B. Efron. Bootstrap methods: Another look at the jackknife. Ann. Statist.,
7(1):1–26, 1979.
[12] J. R. Gilbert, L. R. Shapiro, and G. R. Barnes. A peak-clustering method for meg
group analysis to minimise artefacts due to smoothness. PLoS ONE, 7(9):e45084,
2012.
[13] P. Golland and B. Fischl. Permutation tests for classification: Towards statistical
significance in image-based studies. Inf Process Med Imaging, 18:330–341, 2003.
[14] P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer,
2005.
[15] Peter Hall. On the bootstrap and confidence intervals. Ann. Statist., 14(4):1431–
1452, 1986.
BIBLIOGRAPHY
61
[16] Peter Hall. The Bootstrap and Edgeworth Expansion. Springer, 1995.
[17] P. Hansen, M. Kringelbach, and R. Salmelin. MEG: An Introduction to Methods.
Oxford University Press, 2010.
[18] T. Hesterberg, D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein. Bootstrap Methods and Permutation Tests. Introduction to the Practice of Statistics,
Freeman, New York, 2005.
[19] S. Jun and T. Dongsheng. The Jackknife and Bootstrap. Springer, 1995.
[20] E. L. Lehmann. Some principles of the theory of testing hypotheses. The Annals
of Mathematical Statistics, 21(1):1–26, 1950.
[21] E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 2005.
[22] O. G. Lins, T. W. Picton, P. Berg, and M. Scherg. Ocular artifacts in eeg
and event-related potentials i: Scalp topography. Brain Topography, 6(1):51–63,
1993.
[23] W. S. Merrifield, P. G. Simos, A. C. Papanicolaou, L. M. Philpott, and W. W.
Sutherling. Hemispheric language dominance in magnetoencephalography: Sensitivity, specificity, and data reduction techniques. Epilepsy & amp: Behavior,
10(1):120–128, 2007.
[24] D. N. Politis and J. P. Romano. Limit theorems for weakly dependent hilbert
space valued random variables with application to the stationary bootstrap. Statistica Sinica, 14:461–476, 1994.
BIBLIOGRAPHY
62
[25] F. Rong and J. L. Contreras-Vidal. Magnetoencephalographic artifact identification and automatic removal based on independent component analysis and
categorization approaches. Journal of Neuroscience Methods, 157(2):337–354,
2006.
[26] K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita. Reconstructing spatio-temporal activities of neural sources using an meg vector
beamformer technique. IEEE Trans Biomed Eng., 48(7):760–771, 2001.
[27] S. Taulu and J. Simola. Spatiotemporal signal space separation method for
rejecting nearby interference in meg measurements. Physics in Medicine and
Biology, 51(7):1759–1768, 2006.