Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Localization of Brain Activity Using Permutation Analysis by Hooman Alikhanian A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree of Master of Science Queen’s University Kingston, Ontario, Canada June 2014 c Hooman Alikhanian, 2014 Copyright Abstract In this report we study bootstrap theory and permutation analysis as a hypothesis testing method using bootstrap procedure. We investigate asymptotic properties of the bootstrap procedure as well as bootstrap estimate accuracy using Edgeworth and Cornish-Fisher expansions. We show that resampling with replacement from data provides a theoretically sound method that outperforms Normal approximation of data distribution in terms of convergence error and accuracy of estimates. We conclude the report by applying permutation analysis on Magentoencephalography (MEG) brain signals to localize human brain activity in pointing/reaching tasks and find regions that are significantly active. i Acknowledgements I would like to thank my supervisor Gunnar Blohm for his support throughout the years of my research assistantship in Computational Neuroscience laboratory, keeping me going when times were tough, insightful discussions, and offering invaluable advice. Contents Abstract i Contents iii List of Figures v Chapter 1: 1 Introduction Chapter 2: Bootstrap Theory 2.1 Bootstrap Confidence Interval . . . . . . . . . . . . . . . . . . . . . . 2.2 Iterated Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 6 11 Chapter 3: Hypothesis Testing and Permutation Analysis 3.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Neyman-Pearson Lemma . . . . . . . . . . . . . 3.2 P-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . 13 13 16 21 22 Chapter 4: . . . . . . . . . . . . . . . . Asymptotic Properties of the Mean Chapter 5: Bootstrap Accuracy and Edgeworth 5.1 Edgeworth Expansion . . . . . . . . . . . . . . . 5.2 Bootstrap Edgeworth Expansion . . . . . . . . . 5.3 Bootstrap Confidence Interval Accuracy . . . . Chapter 6: Results 6.1 Methods . . . . . . . . . . . . . 6.1.1 Experimental Paradigm 6.1.2 Data Processing . . . . . 6.2 Permutation Analysis Results . Chapter 7: . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 37 39 . . . . 43 45 45 46 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 iii Bibliography 59 iv List of Figures 6.1 The MEG experiment setup. (a) Time course of the experiment. (b) Three postures of the hand were used in different recording blocks. (c) The fixation cross in the middle with two possible target locations in its left and right hand side. (d) Subjects sit upright under MEG machine performing the pointing task with the wrist only. (e) Task: target (cue) appears in either green or red to inform the subject of the pro or anti nature of the pointing trials. Dimming of the central fixation cross was the movement instruction for subjects. . . . . . . . 6.2 47 The diagram of the event-related beamformer [8]: The data consists of T trials each with M channels and N time samples. The covariance matrix of the data are given to the beamformer as well as the forward solution for dipole at each location. Average source activity is then estimated at each voxel, and dipole orientation is adjusted accordingly to maximize power at the corresponding voxel. . . . . . . . . . . . . . 6.3 48 Average brain activation for pro condition/left target around movement onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 50 Average brain activation for pro condition/left target around cue onset (0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. 51 v 6.5 Average brain activation for anti condition/right target around movement onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 52 Average brain activation for anti condition/right target around cue onset (0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 53 Permutation analysis for pro condition/left target around movement onset in three planes with 95% p-values. Right panel: positive activity (synchronization), Left panel: negative activity (desynchronization) . 6.8 54 Permutation analysis for pro condition/left target around cue onset in three planes with 95% p-values. Null hypothesis is not rejected for positive activation. Negative 95% significant activation is shown. . . . 6.9 55 Permutation analysis for anti condition/right target around movement onset in three planes with 95% p-values. Right panel: positive activity (synchronization), Left panel: negative activity (desynchronization) . 55 6.10 Permutation analysis for anti condition/right target around cue onset in three planes with 95% p-values. Null hypothesis is not rejected for positive activation. Negative 95% significant activation is shown. . . . vi 56 1 Chapter 1 Introduction Bootstrap and permutation tests are resampling methods. The main idea of resampling, as the name suggests, is to estimate properties of a population (such as variance, distribution, confidence intervals) by resampling from the original data. Often in practice access to the whole population is not possible or uneconomical, and instead a sample data from the population is available. Bootstrap procedure provides researchers with a tool to infer population properties by resampling from the data. In this manuscript we study mathematical framework of the bootstrap to evaluate the validity and accuracy of such an inference procedure. Bootstrap is not a new idea. When we do not have information about the density of the population under study and we wish to infer or estimate some properties of the population from data, we consider the same functional of the sample (or empirical) distribution. Instead of taking new samples from the population, we perform resampling with replacement from the data. The idea of using Monte Carlo resampling to estimate bootstrap statistics was proposed by Efron (1979). The approximation improves as the number of resamples 2 increases. Often the number of resamples is in the order of thousands making resampling methods computationally intensive. Modern computers and software make it possible to use these computationally intensive methods to estimate statistical properties in cases where classical methods may be analytically intractable or unusable because of the lack of appropriate assumptions being satisfied. Resampling methods are advantageous to classical inference methods such as Bayesian inference in practice in a sense that they do not require any assumption on the population distribution. Resampling methods work in practice for statistics for which no analytical solution is available for their distribution. Moreover, they provide concrete analogies to theoretical concepts [18]. This report is organized as follows: In chapter 2, we study bootstrap procedure mathematical formulation. In chapter 3 we study hypothesis testing and NeymanPearson lemma, and permutation analysis as a method of solving hypothesis testing problems using bootstrap procedure. In chapter 4 we study asymptotic properties of bootstrap mean estimate. Chapter 5 investigates accuracy of bootstrap estimate and confidence intervals. Finally, we conclude the report by applying permutation analysis on a brain magnetic signal database to localize brain activity in a reaching task. 3 Chapter 2 Bootstrap Theory We have a sample data set of size n that is drawn randomly from a population, that is, we have a set χ = {X1 , X2 , ..., Xn } of independent identically distributed random variables drawn from an unknown population distribution function F . We are interested in some functional θ(F ) of the population, e.g., population mean. For instance, for the population mean θ is given as: θ(F ) = Z x dF (x). (2.1) We do not know F to solve for θ. We estimate θ with θ̂, by estimating the distribution function F . One unbiased estimator that can be used for this purpose is the empirical distribution function, Fb, computed from the sample χ: n 1X I(Xi ≤ x), Fb(x) = n i=1 (2.2) where I(.) is the indicator function. The problem is to study the statistical properties of the estimator θ̂ e.g., variance, 4 confidence interval. To this end, we need the distribution of θ̂−θ. Bootstrap procedure gives us a tool to estimate the distribution function via resampling from the data χ. Bootstrap procedure generally involves three steps [18]: Step 1 Perform resampling with replacement on the data. In resampling, data points are given the same chance to be chosen. Resampled data size is the same as that of the original sample. We want to count the number of resamples that can be drawn from the sample set χ with replacement. There is a one-to-one correspondence between the number of resamples and the number of ways of placing n indistinguishable objects into n numbered boxes. The number of objects that end up in the ith box is essentially the number of times that we choose data Xi . It follows that the number of resamples, N(n) is given by: N(n) = Using Stirling’s formula n! ∼ n n e √ 2n − 1 . n (2.3) 2πn, we have: N(n) ∼ (nπ)−1/2 22n−1 . (2.4) Thus, the number of resamples increases exponentially with n. Step 2 For each resample set χ∗ = {X1∗ , X2∗ , ..., Xn∗ } calculate θ. The distribution of these statistics is referred to as bootstrap distribution [18]. For example, if θ = θ(F ) = µ is the population mean and Fb is the empirical 5 distribution function which assigns weight 1 n to each data point Xi , then: n 1X θ̂ = Xi . n i=1 (2.5) Thus, θ̂ is the sample mean and the distribution of µ is estimated by calculating the sample mean of each resample χ∗ . Step 3 Use the bootstrap distribution to construct confidence intervals for θ. Classical inference theory tells us a great deal about the sample mean. For a Normal population, sample mean is approximately Normally distributed for any sample size. For large sample sizes the sample mean is Normally distributed for a broad range of population distributions as long as the central limit theorem holds. Moreover, the sample standard deviation for the sample mean is: v u u ŝ = t n 1 X (Xi − X̄)2 , n − 1 i=1 (2.6) where X̄ is the sample mean. However, for many statistics other than the sample mean, e.g., quantiles, calculating their standard deviation is analytically intractable let alone their distribution. Of course, one way to get around this problem is to assume a Normal distribution for the desired statistic and move forward. This method, however, may not work for situations where the distribution is heavily skewed to one side or has heavy tails. We will see that bootstrap gives a more accurate estimate of the distribution than Normal approximation. In order to estimate the distribution of the statistic, e.g., sample mean, one might 2.1. BOOTSTRAP CONFIDENCE INTERVAL 6 think that instead of resampling from the data, one can draw other sample sets from the population, estimate the statistic from each sample set, and come up with an estimate of the statistic distribution, e.g., sample mean distribution. In practice, such an approach is difficult to implement, because often resampling from the population involves spending a lot of resources, e.g., financial resources that might not be available. In some cases the population may not easily be accessible. The idea of the bootstrap is that instead of referring to the population to draw other sample sets, resampling is performed from the data at hand. To the extent that the data is representative of the population distribution which is a valid assumption if sampling methodology is right and the sample size is big enough, resampling from it is justified. Even when resources are available, it might be a better idea to take a larger sample from the population and perform resampling on it instead of referring to the population multiple times to draw smaller sample sets [18]. In the case of estimating the standard deviation of θ, SD(θ(X)), bootstrap principle can be summarized as follows: 1. Resample with replacement from χ = {X1 , X2 , ..., Xn } to get the bootstrap sample χ∗ = {X1∗ , X2∗ , ..., Xn∗ }, and calculate bootstrap estimate θ∗ (X). ∗ 2. Get B independent bootstrap replicates θ1∗ (X), θ2∗ (X),...,θB (X). ∗ 3. Estimate SD(θ(X)) by the empirical standard deviation of θ1∗ (X), θ2∗ (X),...,θB (X). 2.1 Bootstrap Confidence Interval As was mentioned above, the idea of bootstrap is to give an estimate of the distribution of θ̂ − θ. To construct confidence intervals in order to evaluate the accuracy of 2.1. BOOTSTRAP CONFIDENCE INTERVAL 7 the estimator θ̂, we need the distribution. In this section we borrow terminology and ideas from [15, 16]. To find the distribution of θ̂ − θ, we need F0 , the population distribution, and c0 . Since we do not know F0 , bootstrap procedure the empirical estimate of it, F1 = F suggests that we use F1 instead of F0 , i.e., take our sample as a representative of the population, and take bootstrap distribution, F2 that is derived from resampling with replacement from the sample as an estimate of F1 [15, 16]. Constructing a two-sided α-confidence interval consists of finding a t that solves: E{ft (F0 , F1 )|F0 } = 0, (2.7) where ft is a functional from a class {ft : t ∈ T } for some set T , and is defined as: ft (F0 , F1 ) = I{θ(F1 ) − t ≤ θ(F0 ) ≤ θ(F1 ) + t} − α, (2.8) where I(.) is the indicator function. According to the bootstrap principal instead of finding t in equation (2.7), we find t̂ that solves: E{ft (F1 , F2 )|F1 } = 0. (2.9) Many statistical problems can be formulated as equation (2.7) with different functional classes. Equation (2.8) gives one example of such classes that are used to construct confidence intervals. Finding an estimate of t, t̂, that solves the approximate equation (2.9) instead of the original one, equation (2.7), is the essence of the bootstrap idea. A number of methods have been proposed in the literature to construct confidence 2.1. BOOTSTRAP CONFIDENCE INTERVAL 8 intervals [15]. Equation (2.8) is one of such methods that we refer to as Bootstrap Interval. Bootstrap Percentile-t Interval is another method in which ft is defined as: ft (F0 , F1 ) = I{θ(F1 ) − tτ (F1 ) ≤ θ(F0 ) ≤ θ(F1 ) + tτ (F1 )} − α. (2.10) Bootstrap Percentile-t Interval involves the introduction of a scaling factor τ (F1 ) to equation (2.8). The difference between the two confidence interval methods lies in the idea of pivotalness. A function of both the data and an unknown parameter is said to be pivotal if it has the same distribution for all values of the unknowns [16]. For example, for a population with a Normal distribution N(µ, σ 2 ), (X̄, σ̂ 2 ) is the maximum likelihood estimator of (µ, σ 2 ). Sample mean distribution is also Normal √ N(µ, σ 2 /n). Thus, Z = n(X̄ − µ) is N(0, σ 2 ). We can immediately see that Z is non-pivotal, because the distribution depends on the unknown σ. The α-confidence interval of the sample mean X̄ can be constructed as: X̄ − n−1/2 xα σ̂, X̄ + n−1/2 xα σ̂ , (2.11) where xα is defined as: P (|N| ≤ xα ) = α, (2.12) for Standard Normal random variable N. √ Since the distribution of T = n(X̄ − µ)/σ̂ is not Normal, but Student’s t distribution with n − 1 degrees of freedom, the coverage error of the interval in equation (2.11) stems from approximating Student’s t distribution by a Normal distribution which is of order O(n−1 ). The distribution of T does not depend on any unknowns, therefore, T is pivotal. An accurate α-confidence interval for the sample mean is 2.1. BOOTSTRAP CONFIDENCE INTERVAL 9 achieved by substituting tα instead of xα in equation (2.11) such that, P (|V | ≤ tα ) = α, (2.13) for Student’s t random variable V . Scaling factor τ in the above example is σ̂ which is the maximum likelihood estimator of the standard deviation. The α-Confidence interval of a statistic θ(F0 ) is called accurate when t is an exact solution of equation (2.7) with functional ft (F0 , F1 ) from equation (2.8), that is: P (θ(F1 ) − t ≤ θ(F0 ) ≤ θ(F1 ) + t| F1 ) = α. (2.14) If t̂ is an approximate solution of equation (2.7) as in bootstrap confidence interval, the probability of the event that θ(F0 ) is in the confidence interval will not be exactly α. The difference: P θ(F1 ) − t̂ ≤ θ(F0 ) ≤ θ(F1 ) + t̂ F1 ) − α, (2.15) is referred to as the coverage error of the interval. The bootstrap percentile-t interval can be estimated for any functional θ(F0 ). According to the bootstrap procedure, we construct bootstrap distribution by resampling with replacement from the sample. The bootstrap estimate of θ, θ(F2 ) and the scaling factor τ (F2 ), e.g., the standard deviation σ̂ ∗ , are estimated from bootstrap distribution F2 . The α-confidence interval is then calculated as: (θ(F2 ) − tα τ (F2 ), θ(F2 ) + tα τ (F2 )) , (2.16) 2.1. BOOTSTRAP CONFIDENCE INTERVAL 10 where tα is defined as: P (|T | ≤ tα ) = α, (2.17) for random variable T with Student’s t distribution with n − 1 degrees of freedom for a sample of size n. In equation (2.16) it is justified to use tα from the t-table as long as the distribution of θ(F1 ) is approximately Normal. If the distribution has heavy tails or is highly skewed, the confidence interval in equation (2.16) will not be a good approximation of the true confidence interval. In general, the distribution of θ(F1 ) is not known. One special case is when θ is the sample mean. In this case if the sample size is large enough, the distribution of the sample mean can be approximated as Normal according to the Central Limit Theorem. What can we do for other statistics? Bootstrap distribution of the statistic can be constructed. If the bootstrap distribution is approximately Normal and not heavily skewed, the confidence interval of equation (2.16) can be used. To see this, we can estimate t∗α from the bootstrap distribution of θ(F1 ), that is to find t∗α such that: P (θ(F2 ) − t∗α τ (F2 ) ≤ θ(F1 ) ≤ θ(F2 ) + t∗α τ (F2 )|F1 ) = α, (2.18) which can be solved as: t∗α = inf {P (θ(F2 ) − t∗α τ (F2 ) ≤ θ(F1 ) ≤ θ(F2 ) + t∗α τ (F2 )|F1 ) ≥ α}. t (2.19) To solve equation (2.18) using Monte Carlo approximation, we choose integers B ≥ 1 and 1 ≤ ν ≤ B such that ν/(B + 1) = α for a rational number α. For instance if 2.2. ITERATED BOOTSTRAP 11 α = 0.95, we can take (ν, B) = (95, 99). According to bootstrap procedure, we draw B independent resamples from χ with replacement, namely {χ∗1 , χ∗2 , ..., χ∗B }, and for each resample we calculate the corresponding empirical distribution F2,b , b = 1, 2, ..., B. Define: Tb∗ = |θ(F2,b ) − θ(F1 ))|/τ (F2,b ). (2.20) We pick the νth largest value of Tb∗ as the Monte Carlo estimate of t∗α . As B → ∞, Tb∗ → t∗α with probability one. Now that we have estimated t∗α using the bootstrap distribution, we can construct the bootstrap-t confidence interval as: (θ(F2 ) − t∗α τ (F2 ), θ(F2 ) + t∗α τ (F2 )) . (2.21) If this confidence interval matches closely with the interval from equation (2.16) in which we got t from t-table, distribution of θ is approximately Normal. Otherwise, equation (2.21) provides a better approximation of the confidence interval in the sense of a smaller coverage error. 2.2 Iterated Bootstrap To develop the bootstrap idea, we started from finding t that solves equation (2.7). A lack of knowledge of population distribution F0 instructed us to substitute F1 instead of it in equation (2.7), and substitute the bootstrap distribution F2 instead of F1 to solve for t̂1 in equation (2.9) as an approximation of t. We argued that we can use the empirical distribution F1 instead of the population distribution as long as the sample can be considered representative of the population. 2.2. ITERATED BOOTSTRAP 12 This idea can be developed one step further by resampling with replacement from each resample data χ∗ , and solve for t̂2 as an approximation of t in: E{ft (F2 , F3 )|F2 } = 0, (2.22) where F2 is the bootstrap distribution from resampling of data χ, and F3 is the bootstrap distribution from resampling of resampled data χ∗ . In theory we can continue this process ad infinitum. In fact, it can be shown that each iteration improves coverage error by the order of O(n−1 ). However, we showed that the number of resamples in the first iteration grows exponentially with the sample size n. This different growth rate makes iterated resampling computationally intractable in practice. In most practical problems, resampling is performed only in one level with 1000 to 5000 resamples. 13 Chapter 3 Hypothesis Testing and Permutation Analysis 3.1 Hypothesis Testing Hypothesis testing is a statistical decision making procedure to test whether or not a hypothesis that has been formulated about a population statistic is correct. The decision leads to either accepting or rejecting the hypothesis in question. The hypothesis is based on some property of a statistic from a population. For instance, to test the hypothesis that the mean of a population is equal to µ0 , we can formulate a hypothesis testing problem with the null hypothesis defined as H : µ = µ0 , and the alternative hypothesis as K : µ 6= µ0 . A hypothesis testing procedure uses inferential statistics to learn more about a population that is too large or inaccessible. Often instead of the population we have access to a sample that is drawn randomly from it, and we need to estimate the statistic from the sample at hand. For instance, to solve a hypothesis testing about the mean of a population, we can use the sample mean which is the unbiased estimator of the mean. To state the problem, let us assume that we want to form a decision about a 3.1. HYPOTHESIS TESTING 14 random variable X with distribution Pθ that belongs to a class P = {Pθ , θ ∈ Ω}. We want to formulate some hypothesis about θ. The set Ω can be partitioned into classes for which the hypothesis is true and those for which it is false. The resulted two mutually exclusive classes are ΩH and ΩK respectively such that ΩH ∪ ΩK = Ω. Four possible outcomes can occur as a result of testing a hypothesis: (1) The null hypothesis is true and we accept it, (2) the null hypothesis is true and we reject it, (3) the null hypothesis is false and we accept it, and (4) the null hypothesis is false and we reject it. In cases (2) and (3) we are making an error in our decision making, that is, we form a perception about a property of the population that is in fact not true. Thus, two types of error can happen in the decision making process: Type I error occurs when (2) is the case, and type II error occurs when (3) is the case. Let us denote the probabilities of type I error and type II error by α and β respectively. Ideally, hypothesis testing should be performed in a manner that keeps the probabilities of the two types of error α and β to a minimum. However, when the number of observations is given, both probabilities cannot be controlled simultaneously. In hypothesis testing, researchers collect evidence to reject the null hypothesis. In the process, they assume that the null hypothesis is true unless they can prove otherwise. Thus, it is customary to control the probability of committing type I error [21]. The goal in hypothesis testing is to partition the sample space S into two mutually exclusive sets S0 and S1 . If X falls in S0 , the null hypothesis is accepted, and if it falls in S1 we reject the null hypothesis. S0 and S1 are referred to as acceptance and critical regions respectively. In order to control the probability of type I error, we put a significance level α which is a number between 0 and 1 on the probability of S1 under the assumption 3.1. HYPOTHESIS TESTING 15 that the null hypothesis is true. That is: Pθ (X ∈ S1 ) ≤ α for all θ ∈ ΩH . (3.1) We are in effect limiting the probability of type I error to α which can be chosen as an arbitrary small number such as 0.05. We then find S1 such that Pθ (X ∈ S1 ) is maximized for all θ ∈ ΩK under equation (3.1) condition. We maximize the probability of rejecting the null hypothesis when it is in fact false. This probability is referred to as the power of the critical region. So far we have considered a case where we allow every outcome x of random variable X to be either a member of S0 or S1 . We can generalize this idea and assume that x can belong to the rejection region with probability φ(x), and to the acceptance region by probability 1 − φ(x). Then the hypothesis testing experiment will involve drawing from random variable X with two possible outcomes R and R̄ with probabilities φ(x) and 1−φ(x) respectively. If R is the outcome of the experiment we reject the hypothesis, and otherwise accept it. If the distribution of X is Pθ , then the probability of rejection will be: Eθ φ(X) = Z φ(x)dPθ (x). (3.2) The problem is to find φ(x) that maximizes test power βθ which is defined as: Eθ φ(X) = Z φ(x)dPθ (x) for all θ ∈ ΩK , (3.3) 3.1. HYPOTHESIS TESTING 16 under the condition: Eθ φ(X) ≤ α for all θ ∈ ΩH . 3.1.1 (3.4) The Neyman-Pearson Lemma The Neyman-Pearson Lemma provides us with a way of finding the best critical region [21]. Theorem 3.1.1. Let P0 and P1 be probability distributions with densities p0 and p1 respectively with respect to a measure µ. (1) Existence: For testing H : p0 against the alternative K : p1 , there exists a test φ and a constant k such that: E0 φ(X) = α, and φ(x) = 1 0 when p1 (x) > kp0 (x) (3.5) (3.6) when p1 (x) < kp0 (x) (2) Sufficient condition for the most powerful test: if a test satisfies equation (3.5) and equation (3.6) for some k, then it is most powerful for testing p0 against p1 at level α. (3) Necessary condition for the most powerful test: If φ is the most powerful for testing p0 against p1 at level α, then for some k it satisfies (3.6) all most everywhere µ. It also satisfies (3.5) unless there exists a test of size < α and with power 1. Proof. If we define 0 × ∞ := 0 and allow k to become ∞, the theorem is true for α = 0 and α = 1. So, let us assume that 0 < α < 1. (1) Let α(c) = P0 {p1 (X) > cp0 (X)}. Because the probability is computed under P0 , we just need to consider the inequality for the set where p0 (x) > 0. Therefore, α(c) will 3.1. HYPOTHESIS TESTING 17 be the probability that the random variable p1 (X)/p0 (X) is greater than c. 1 − α(c) will then be a cumulative distribution function. Thus, α(c) is nonincreasing, and continuous on the right, that is: α(c − 0) − α(c) = P0 {p1 (x)/p0 (x) = c}, α(−∞) = 1, and α(∞) = 0. for given 0 < α < 1 let c0 be such that α(c0 ) < α < α(c0 − 0), and consider the test φ defined by: φ(x) = 1 when p1 (x) > c0 p0 (x) α−α(c0 ) when p1 (x) = c0 p0 (x) α(c0 −0)−α(c0 ) 0 when p1 (x) < c0 p0 (x). The middle expression is defined unless α(c0 ) = α(c0 − 0). Under that condition P0 {p1 (X) = c0 p0 (X)} = 0, and φ is defined almost everywhere. The size of φ is: E0 φ(X) = P0 p1 (X) > c0 p0 (X) α − α(c0 ) + P0 α(c0 − 0) − α(c0 ) p1 (X) = c0 p0 (X) = α. (3.7) Comparing the size in equation (3.7) with equation (3.5) we see that c0 is the k of the theorem. (2) To prove sufficiency, let us take φ∗ as any other test that satisfies condition E0 φ∗ (X) ≤ α. Denote S + and S − as sample space subsets for which φ(x) − φ∗ (x) > 0 and φ(x) − φ∗ (x) < 0 respectively. For all x in S + and S − , we have p1 (x) ≥ kp0 (x) and p1 (x) ≤ kp0 (x) respectively. Thus we have: Z ∗ (φ − φ )(p1 − kp0 )dµ = Z S + ∪S − (φ − φ∗ )(p1 − kp0 )dµ ≥ 0. (3.8) 3.1. HYPOTHESIS TESTING 18 The difference in power is then: Z ∗ (φ − φ )p1 dµ ≥ k Z (φ − φ∗ )p0 dµ ≥ 0. (3.9) Therefore, φ is more powerful than φ∗ . (3) To prove the necessary condition, let us assume that φ∗ is the most powerful to test p1 against p0 at level α, and it is not equal φ. Take S as the intersection of S + ∪ S − with {x : p1 (x) 6= p0 (x)} and suppose that µ(S) > 0. Since (φ − φ∗ )(p1 − kp0 ) is positive on S, we have: Z S + ∪S − ∗ (φ − φ )(p1 − kp0 )dµ = Z S (φ − φ∗ )(p1 − kp0 )dµ > 0. (3.10) Therefore φ is more powerful against p1 than φ∗ which is a contradiction unless µ(S) = 0 which completes the proof [21, 20]. The proof shows that equations (3.5) and (3.6) give necessary and sufficient conditions for a most powerful test up to sets of measure zero, that is whenever the set {x : p1 (x) = kp0 (x)} is µ-measure zero. Note that the theorem applies for discrete distributions as well. To summarize the idea behind the Neyman-Pearson lemma, let us suppose that X1 , X2 , ..., Xn is an independent identically distributed (i.i.d) random sample with joint density function f (X; θ). In testing null hypothesis H : θ = θ0 against the alternative K : θ = θ1 the critical region: CK = {x : f (x, θ0 )/f (x, θ1 ) < K}, (3.11) 3.1. HYPOTHESIS TESTING 19 is most powerful for K > 0 according to the Neyman-Pearson lemma. As an example, let us suppose that X represents a single observation from probability density function f (x, θ) = θxθ−1 for 0 < x < 1. To test null hypothesis H : θ0 = 1 against K : θ1 = 2 with significance level α = 0.95. We have: f (x, θ0 ) 1 = . f (x, θ1 ) 2x Thus, the rejection region is R = {x : x > k ′ }, where k ′ = 1/2k, and k > 0. To determine the value of k ′ , we calculate the size of the test with respect to k ′ and solve for k ′ to get the desired test size 0.05: P {x ∈ R|H} = P {x > k ′ |H} = 1 − k ′ = 0.05. Thus, k ′ = 0.95, and the rejection region is R = {x : x > 0.95}. From the lemma, among all tests of null hypothesis H : θ = 1 against K : θ = 2 with level 0.05, rejection region R has the smallest type II error probability. Our treatment of the problem so far involves simple distributions where the distribution class contains a single distribution. This enables us to solve hypothesis testing problem for the null hypothesis of the form H : θ = θ0 and the alternative K : θ = θ1 . In practical applications, however, we might be interested in solving a hypothesis testing problem of the form H : θ ≤ θ0 and K : θ > θ0 which involves a composite distribution class rather than a simple one. If there exists a real-valued function T (x) such that for any θ < θ′ , the distributions Pθ and Pθ′ are distinct, and the ratio pθ′ (x)/pθ (x) is a nondecreasing function of T (x) then pθ (x) is said to have monotone likelihood ratio property [21]. 3.1. HYPOTHESIS TESTING 20 Theorem 3.1.2. Let the random variable X have probability density pθ (x) with monotone likelihood ratio property in a real-valued function T (x), and θ a real parameter. (1) For testing H : θ ≤ θ0 against K : θ > θ0 the most powerful test is given by: 1 φ(x) = γ 0 when T (x) > C when T (x) = C (3.12) when T (x) < C, where C and γ are determined by: Eθ0 φ(X) = α. (3.13) β(θ) = Eθ φ(X) (3.14) (2) The power function of the test: is strictly increasing for all θ for which 0 < β(θ) < 1. (3) The test from equations (3.12) and (3.13) is the most powerful test for testing H ′ : θ ≤ θ′ against K ′ : θ > θ′ at level α′ = β(θ′ ) for all θ′ . (4) The test minimizes β(θ), the probability of type I error, among all tests that satisfy (3.13) for θ < θ0 . The one-parameter exponential family is an important class of distributions with monotone likelihood property with respect to real-valued function T (x) that satisfy the assumptions of the theorem from the following corollary [21]: 3.2. P-VALUES 21 Corollary 3.1.3. Let X have probability density function with respect to some measure µ, and θ be a real number pθ (x) = C(θ)eQ(θ)T (x) h(x), (3.15) where Q(θ) is strictly monotone. Then φ(x) from equation (3.12) is the most powerful test for testing H : θ ≤ θ0 against K : θ > θ0 for increasing Q with level α, where C and γ are determined from equation (3.13). For decreasing θ the inequalities in equation (3.12) are reversed. 3.2 P-Values So far we have studied hypothesis testing for a fixed significance level α. In alternative standard non-Bayesian approach α is not fixed. For varying α, we want to determine the smallest significance level at which the null hypothesis would be rejected for a given observation. This significance level is referred to as the p-value of the test. For random variable X let us suppose that the distribution of p1 (X)/p0 (X) is continuous. Then the most powerful test can specify rejection region Sα as {x : p1 (x)/p0 (x) > k} for k = k(α) as a function of α, where k is determined from the size equation (3.5). Performing the test for varying α creates nested rejection regions, that is: Sα ⊂ Sα′ if α < α′ . (3.16) The p-value can now be determined as: p̂ = p̂(X) = inf{α : X ∈ Sα }. (3.17) 3.3. PERMUTATION ANALYSIS 22 For example, let us suppose that X is a Normal random variable N(µ, σ 2 ), and σ 2 is known. We want to formulate a hypothesis testing problem on µ with H : µ = 0 as the null hypothesis against K : µ = µ1 as the alternative for some µ1 > 0. The likelihood ratio can be written as: p1 (x) = p0 (x) exp h −(x−µ1 )2 2σ2 exp −x2 2σ2 i µ21 µ1 x − 2 . = exp σ2 2σ Thus, in order to have p1 (x)/p0 (x) > k, x should be greater than k ′ > 0 which can be determined from the constraint P0 {X > k ′ } = α. Thus, the rejection region can be written as Sα = {X : X > σz1−α }, where z1−α is the 1 − α percentile of the standard Normal distribution. From the definition of percentile, we have: Sα = {X : 1 − Φ( X ) < α}. σ For a given observed value of X, the infimum of Sα over all α can be written as: p̂ = 1 − Φ( X ), σ which is uniformly distributed on (0, 1). 3.3 Permutation Analysis The Neyman-Pearson lemma determines the most powerful test for simple tests as well as for composite ones with a monotone likelihood property. As we studied in the previous section, to determine the rejection region one also needs to know the 3.3. PERMUTATION ANALYSIS 23 distribution of the test statistic under the null hypothesis. Often in practical applications finding the test statistic distribution under the null hypothesis cannot be done analytically. Permutation tests address this problem by providing researchers with a simple way of estimating test statistic distribution using the bootstrap idea. A permutation test is essentially hypothesis testing through bootstrapping. The idea of permutation analysis is to estimate a test statistic distribution by resampling with replacement under the assumption that the null hypothesis is true. For instance, suppose that we perform an experiment in which brain magnetic signal is recorded from a brain region while the subject is performing a reaching task. The signal is recorded for 2 seconds. During the first 0.5 seconds a baseline signal is recorded while the subject is sitting still doing nothing, and the last 1.5 seconds is recorded while the subject is performing the reaching task. We want to investigate whether the brain region is active during the experiment. To be more specific, suppose that the data is collected at a rate of 600 samples per second. Thus, we have 300 samples of baseline, and 900 samples from the task. One way to approach the problem is to compare the mean of the signal during the task, µtask with that of the baseline, µbaseline . To this end, we formulate a hypothesis testing problem to test null hypothesis H : µtask − µbaseline = 0 against K : µtask − µbaseline > 0 as alternative. As a test statistic we take the difference between sample means x̄task − x̄baseline . The null hypothesis assumes no difference between the baseline and the task. Thus, resampling with replacement under the null hypothesis would mean that out of 1200 total samples we pick 300 and 900 samples to assign to the baseline and the task respectively. Then, we calculate the difference between sample means for each resample. If we take 2048 resamples, we will have 3.3. PERMUTATION ANALYSIS 24 2048 of such mean differences. We can calculate the empirical distribution of the differences to build the bootstrap distribution. To calculate the p-value we locate the sample mean difference in the bootstrap distribution. Permutation tests scramble data randomly between the groups. Therefore, for the test to be valid the distribution of the two groups must be the same under the null hypothesis. To account for the differences in standard deviation, it is more accurate to consider pivotal test statistic and normalize by the unbiased standard deviation estimator of the groups. We can use permutation tests when problem design and the null hypothesis allow us to resample under the null hypothesis. Permutation tests are suitable for three groups of problems: Two-sample problems when the null hypothesis assumes that test statistic is the same for both groups, matched pair designs when the null hypothesis assumes that there are only random differences within pairs, and for problems that study a relationship, i.e. correlation coefficient between two variables when the null hypothesis assumes no relationship between them [18]. In the Result section we study the above example in more depth. 25 Chapter 4 Asymptotic Properties of the Mean In chapter 2 we reviewed bootstrap theory. In practice, bootstrap procedure is used when population and/or statistic distribution(s) is not known. In this section we study the validity of the bootstrap procedure for sample mean which is one of the statistics that can be handled analytically. Let X1 , X2 , ..., Xn be n independent identically distributed random variables with the common distribution F with mean µ, and variance σ 2 both unknown. The sample P mean µ̄n = n1 ni=1 Xi is the unbiased estimator of the mean µ. If we take the Pn 1 2 2 estimator σ̂n2 = n−1 i=1 (Xi − µ̄n ) as an estimator of σ then from the Central Limit √ Theorem the pivotal statistic Qn = n(µ̄n − µ)/σ̂n tends to N(0, 1) in distribution. It is interesting to study the asymptotic behaviour of the bootstrap distribution. We pick n resamples X1∗ , X2∗ , ..., Xn∗ with replacement from the sample set. With each resample having the same chance of getting picked, we can assign 1/n probability mass to each of the n resamples. The bootstrap sample mean is given as: n µ̄∗n 1X ∗ X , = n i=1 i (4.1) 26 and the sample variance is given as: n σ̂n∗2 1 X ∗ = (Xi − µ̄∗n )2 . n − 1 i=1 (4.2) We are interested in the asymptotic behaviour of the pivotal bootstrap statistic Q∗n = √ n(µ̄∗n − µ̄n )/σ̂n∗ . As we discussed in chapter 2, we construct the pivotal statistic by replacing the sample mean by the bootstrap mean, and the population mean µ by the sample mean µ̄n . Essentially we are taking the bootstrap distribution F ∗ that assigns the same probability mass to all Xi , i = 1, ..., n instead of the population distribution F. Theorem 4.0.1. Let X1 , X2 , ... be an independent identically distributed random sequence with positive variance σ 2 . For almost all sample sequences X1 , X2 , ... conditional on (X1 , X2 , ..., Xn ), as n tends to ∞: √ (1) Conditional distribution n(µ̄∗n − µ̄n ) converges to N(0, σ 2 ) in distribution. (2) σ̂n∗ → σ in probability. Parts (1) and (2) of Theorem (4.0.1) and Slutski theorem imply that the pivotal bootstrap statistic Q∗n converges to N(0, 1) in distribution. In this report we prove part (1) of the theorem using the ideas and lemmas from Angus (1989) [2]. For the complete proof of part (2) using the law of large numbers, we refer to Politis and Romano (1994) [24]. Lemma 4.0.2 (Borel-Cantelli). Let {An , n ≥ 1} be a sequence of events in a probability space. If ∞ X n=1 P (An ) < ∞, (4.3) 27 then P (Ai.o.) = 0, where i.o. stands for infinity often, that is: ∞ A(i.o.) = ∩∞ i=1 ∪n=i An . (4.4) Proof. We want to prove that only a finite number of the events can occur. Let In = I{An } be the indicator function of An . The number of events that can occur P can be determined as N = ∞ k=1 Ik . P (Ai.o.) = 0 if and only if P (N < ∞) = 1. By P∞ Fubini’s theorem E(N) = n=1 P (An ) which is finite by assumption. E(N) < ∞ implies that P (N < ∞) = 1 which completes the proof. Lemma 4.0.3. Let the sequence X1 , X2 , ... consist of identical independent distribution random variables with E|X1 | < ∞, then for every ǫ > 0 P {|Xn | > ǫn i.o.} = 0. Proof. It is sufficient to show that the Borel-Cantelli assumption holds. Fix ǫ > 0: ∞ X n=1 P (|Xn | ≥ ǫn) = ∞ X ∞ X P {ǫk ≤ |X1 | ≤ ǫ(k + 1)} n=1 k=n ∞ X k X F ubini = k=1 n=1 = ∞ X k=1 P {ǫk ≤ |X1 | ≤ ǫ(k + 1)} kP {ǫk ≤ |X1 | ≤ ǫ(k + 1)} ≤ E|X1 |/ǫ < ∞. Borel-Cantelli lemma completes the proof. Lemma 4.0.4. Let the sequence X1 , X2 , ... consist of identical independent distributed 28 random variables with E|X1 |2 < ∞, then: −3/2 lim sup n n→∞ n X k=1 |Xk |3 → 0 almost surely. Proof. Fix ǫ > 0: −3/2 n n X k=1 3 −3/2 |Xk | = n ≤ n−3/2 ≤ n−3/2 n X k=1 n X k=1 n X k=1 n X √ √ −3/2 |Xk | I{|Xk | ≥ ǫ k} + n |Xk |3 I{|Xk | < ǫ k} 3 k=1 n X −3/2 √ |Xk |3 I{|Xk | ≥ ǫ k} + ǫn √ |Xk |3 I{|Xk | ≥ ǫ k} + ǫn−1 k=1 n X k=1 √ |Xk |2 k |Xk |2 (4.5) By Lemma (4.0.3): n o √ 3 P |Xk | I{|Xk | ≥ ǫ k} = 6 0 i.o. = P |Xk |2 ≥ ǫ2 k i.o. = 0. √ Thus, |Xk |3 I{|Xk | ≥ ǫ k} = 0 almost surely for all but finitely many k values. √ P Therefore, n−3/2 nk=1 |Xk |3 I{|Xk | ≥ ǫ k} → 0 almost surely as n → ∞. By P the law of large numbers, the second term in equation (4.5), ǫn−1 nk=1 |Xk |2 , conP verges almost surely to ǫE[X12 ] as n → ∞. Thus, lim supn→∞ n−3/2 nk=1 |Xk |3 → 0 almost surely. Now we are ready to prove Theorem (4.0.1). Proof. Define Tn∗ = √ n(µ̄∗n − µ̄n ). Tn∗ can be written as the sum of n independent identically distributed random variables n−1/2 (Xk∗ − µ̄n ), k = 1, ..., n. Resampled 29 random variable Xk∗ can take values from the sample space X1 , ..., Xn with equal probability mass 1/n. Thus, the characteristic function of Tn∗ can be written as: " n 1X E [exp(itTn∗ )] = exp n j=1 it(Xj − µ̄n ) √ n #n . (4.6) By repeated integration by parts exp(ix) can be written as: exp(ix) = 1 + ix − where θ(x) := 3 x3 Rx 0 x2 x3 + θ(x), 2 6 (4.7) i3 (x − t)2 eit dt is a continuous function of x, and |θ(x)| ≤ 1. Thus, equation (4.6) can be written as: n = 1+ 1 X it(Xj − µ̄n ) √ n j=1 n n n 1 X t3 (Xj − µ̄n )3 1 X t2 (Xj − µ̄n )2 + θ − n j=1 2n n j=1 6n3/2 t(Xj − µ̄n ) √ n n . (4.8) As n → ∞ by the law of large numbers the second term in brackets goes to zero almost surely. If we refer to the last term as Qn , we get: E From |θ(x)| ≤ 1, n|Qn | ≤ [exp(itTn∗ )] |t|3 −3/2 n 6 t2 σ̂n2 + Qn = 1− 2n Pn j=1 |Xj n . (4.9) − µ̄n |3 . Thus, from Lemma (4.0.4) as n → ∞, nQn → 0 almost surely. By the law of large numbers σ̂n2 → σ 2 almost surely as n → ∞. Hence, as n → ∞ we have: E [exp(itTn∗ )|X1 , X2 , ..., Xn ] n 2 2 −t σ t2 σ 2 , → exp = 1− 2n 2 (4.10) 30 which is the characteristic function of N(0, σ 2 ). This completes the proof of part (1). Thus far, we have shown that the bootstrap procedure works asymptotically for the mean. Delta method guarantees the validity of the procedure for any function with continuous derivatives in the neighbourhood of the mean as well. Another problem of interest is to study the order of accuracy of the bootstrap estimate. In particular, is it more accurate to use a Normal approximation for the population instead of using the bootstrap estimate of the distribution? This is the subject of the next section where we show that the answer is no! 31 Chapter 5 Bootstrap Accuracy and Edgeworth Expansion 5.1 Edgeworth Expansion Moment generating function of random variable X is defined as φ(t) = E(eitX ). As the name suggests, moments of the random variable can be found from the moment generating function. The jth moment of the random variable X which is defined as µj = E(X j ) is the jth derivative of the moment generating function at t = 0: µj = dj φ(t) (0). dtj (5.1) Taylor series expansion of the moment generating function at t = 0 can be written as: φ(t) = E(eitX ) = E ∞ X (itX)n n=0 = n! ∞ X µn (it)n n=0 n! , ! (5.2) 5.1. EDGEWORTH EXPANSION 32 where µ0 = 1 and 0! = 1. Cumulant generating function of the random variable X is defined as log(φ(t)), the natural logarithm of moment generating function. Cumulants are found from the power series expansion of the cumulant generating function. log(φ(t)) = ∞ X κn (it)n n=1 n! , (5.3) where κn is the nth cumulant of the random variable X. To find the relationship between cumulants and moments of random variable X, we can write log(φ(t)) as: ∞ ∞ X X 1 1 itX n log(φ(t)) = − (1 − E(e )) = − n n n=1 n=1 ∞ X µm (it)m − m! m=1 !n . (5.4) Comparing (5.3) and (5.4) expansions, it can be shown that the cumulants are homogeneous polynomials of moments and vice versa [16]. In particular, we have the following relationships for the first four cumulants: κ1 = µ1 (5.5) κ2 = µ2 − µ21 = var(X) (5.6) κ3 = µ3 − 3µ2 µ1 + 2µ31 (5.7) κ4 = µ4 − 4µ3 µ1 − 3µ22 + 12µ2 µ21 − 6µ41 . (5.8) The third and forth cumulants, κ3 and κ4 , are referred to as skewness and kurtosis respectively. Let X1 , X2 , ... be independent identically distributed random variables with mean 5.1. EDGEWORTH EXPANSION 33 µ, and variance σ 2 . As we mentioned in the previous section, by the Central Limit √ Theorem Qn = n(µ̄n − µ) converges to N(0, σ 2 ) in distribution as n → ∞, where µ̄n is the sample mean for a sample of size n. Let us assume that µ1 = µ = 0 and √ σ 2 = 1. The problem of interest is to find the cumulative distribution of Sn = nµ̄n = P n−1/2 nj=1 Xj . In particular, we are interested in the power series of P (Sn ≤ x). Such an expansion is referred to as Edgeworth expansion. To this end, we start from the characteristic function of Sn : φn (t) = E{exp(itSn )}. (5.9) From the independence assumption φn (t) can be written as: φn (t) = E{exp(itn−1/2 X1 )}E{exp(itn−1/2 X2 )}...E{exp(itn−1/2 Xn )} = φ(tn−1/2 )...φ(tn−1/2 ) = φn (tn−1/2 ). (5.10) From equation (5.3), we have: " φn (t) = φn (tn−1/2 ) = exp ∞ X κj (itn−1/2 )j j=1 j! !#n . (5.11) Since κ1 = 0 and κ2 = 1, κj n−(j−2)/2 (it)j t2 κ3 n−1/2 (it)3 + ... + + ... . φn (t) = exp − + 2 6 j! (5.12) By expanding the exponent we get: 2 /2 φn (t) = e−t 2 /2 + n−1/2 r1 (it)e−t 2 /2 + ... + n−j/2 rj (it)e−t + ..., (5.13) 5.1. EDGEWORTH EXPANSION 34 where rj (it)’s are polynomials of degree 3j and parity j with coefficients κ3 , ..., κj+1 [16]. Moreover, rj (it)’s are independent of n. In particular, 1 r1 (u) = κ3 u3 6 1 1 r2 (u) = κ23 u6 + κ4 u4 . 72 24 By definition, the characteristic function of Sn can be written as, φn (t) = Z ∞ −∞ 2 /2 Moreover, the fact that e−t eitx dP (Sn ≤ x). (5.14) is the characteristic function of standard Normal sug- gests that it is possible to take the inverse expansion of φn (t) to get the Cumulative Distribution Function (CDF) of Sn , P (Sn ≤ x) = Φ(x) + n−1/2 R1 (x) + ... + n−j/2 Rj (x) + ...., (5.15) where Φ(x) is the CDF of standard Normal, and, Z ∞ 2 /2 eitx dRj (x) = rj (it)e−t . (5.16) −∞ The next step is to calculate Rj (x), −t2 /2 e = Z ∞ eitx dΦ(x). −∞ (5.17) 5.1. EDGEWORTH EXPANSION 35 Integration by parts on equation (5.17) gives, −t2 /2 e = (−it) −1 = (−it)−j Z ∞ Z−∞ ∞ eitx dΦ(1) (x) = ... eitx dΦ(j) (x), −∞ where Φ(j) (x) = D j Φ(x) and D is the differentiation operator. Therefore, Z ∞ 2 /2 eitx d{rj (−D)Φ(x)} = rj (it)e−t . (5.18) −∞ From equations (5.16) and (5.18), and the uniqueness of Fourier transform, we deduce that, Rj (x) = rj (−D)Φ(x). (5.19) (−D)j Φ(x) = −Hej−1 (x)φ(x), (5.20) For j ≥ 1, where Hen (x) is the standardized Hermite polynomial of degree n with the same parity as n, and is defined as, Hen (x) = (−1)n ex 2 /2 dn −x2 /2 e . dxn (5.21) Therefore, Rj (x) can be written as Rj (x) = Pj (x)φ(x), where Pj (x) is a polynomial of degree 3j − 1 with opposite parity to j, and coefficients that depend on moments 5.1. EDGEWORTH EXPANSION 36 of X up to order j + 2. In particular, 1 P1 (x) = − κ3 (x2 − 1), 6 1 2 4 1 2 2 κ4 (x − 3) + κ3 (x − 10x + 15) . P2 (x) = −x 24 72 (5.22) (5.23) The Edgeworth expansion of the cumulative distribution function, P (Sn ≤ x) can be written as, P (Sn ≤ x) = Φ(x)+n−1/2 P1 (x)φ(x)+n−1 P2 (x)φ(x)+...+n−j/2 Pj (x)φ(x)+.... (5.24) For the CDF of random variable Y the Edgeworth expansion converges as an infinite series if E{exp(1/4Y 4 )} < ∞ [10]. This is a restrictive condition on the tails of the distribution. However, the expansion shows that if the series is stopped after j terms, the remainder will be of order n−j/2 , P (Sn ≤ x) = Φ(x) + n−1/2 P1 (x)φ(x) + n−1 P2 (x)φ(x) + ... + n−j/2 Pj (x)φ(x) + o(n−j/2 ), (5.25) which is a valid expansion for fixed j as n → ∞. Cramer [10] gives a sufficient regularity condition for the expansion as, E(|X|j+2) < ∞ lim sup |φ(x)| < 1. (5.26) |t|→∞ Bhattacharya and Ghosh [5] show that for statistics with continuous derivatives in the neighbourhood of the mean equation (5.25) with regularity conditions (5.26) 5.2. BOOTSTRAP EDGEWORTH EXPANSION 37 converges uniformly in x as n → ∞. As is the case for Sn , polynomials Pj (x) are of degrees 3j − 1 with opposite parities to j (even polynomial for odd j and vice versa), and coefficients that depend on moments of X up to order j + 2 and their derivatives up to order k + 2. 5.2 Bootstrap Edgeworth Expansion Let X1 , X2 , ..., Xn be independent identically distributed random samples with common distribution F , and θ̂ be the estimator of the statistic θ that is computed from the dataset χ = {X1 , X2 , ..., Xn } with empirical distribution Fb. Let us further assume that S = n1/2 (θ̂ − θ) is asymptotically N(0, σ 2 ) where σ 2 = σ 2 (F ) is the variance of S. Then pivotal statistic T is, T = n1/2 (θ̂ − θ)/σ̂, (5.27) where σ̂ 2 = σ 2 (Fb) is asymptotically N(0, 1). From [5], the Edgeworth expansions of S and T CFDs are, P (S ≤ x) = Φ(x/σ) + n−1/2 P1 (x/σ)φ(x/σ) + n−1 P2 (x/σ)φ(x/σ) + ..., (5.28) P (T ≤ x) = Φ(x) + n−1/2 Q1 (x)φ(x) + n−1 Q2 (x)φ(x) + ..., (5.29) where Pj (x) and Qj (x) are polynomials of degree 3j − 1 of opposite parity to j. Thus, the Normal approximation of the CDF of S and T is P (S ≤ x) ≃ Φ(x/σ) and P (T ≤ x) ≃ Φ(x) respectively which are in error by order n−1/2 . To study bootstrap accuracy, let us assume that the bootstrap estimate θ̂∗ of θ̂ is 5.2. BOOTSTRAP EDGEWORTH EXPANSION 38 computed from the resample dataset χ∗ = {X1∗ , X2∗ , ..., Xn∗ } with bootstrap distribution Fb ∗ . Then bootstrap estimates of S and T are written as, S ∗ = n1/2 (θ̂∗ − θ̂), (5.30) T ∗ = n1/2 (θ̂∗ − θ̂)/σ̂ ∗ , (5.31) where σ̂ ∗ is the bootstrap estimate of σ̂. Edgeworth expansions of S ∗ and T ∗ CDFs are, P (S ∗ ≤ x|χ) = Φ(x/σ̂) + n−1/2 Pb1 (x/σ̂)φ(x/σ̂) + n−1 Pb2 (x/σ̂)φ(x/σ̂) + ..., b1 (x)φ(x) + n−1 Q b2 (x)φ(x) + ..., P (T ∗ ≤ x|χ) = Φ(x) + n−1/2 Q (5.32) (5.33) bj (x) are obtained by replacing unknowns in Pj (x) and Qj (x) by where Pbj (x) and Q their bootstrap estimates respectively. The estimates of coefficients in Pbj (x) and bj (x) differ from their corresponding values in Pj (x) and Qj (x) by the order n−1/2 . Q Therefore, the accuracy of bootstrap CDFs of S and T is, P (S ∗ ≤ x) − P (S ≤ x) = Φ(x/σ̂) − Φ(x/σ) + O(n−1), (5.34) P (T ∗ ≤ x) − P (T ≤ x) = O(n−1). (5.35) In equation (5.34) standard deviation estimate σ̂ differs from the standard deviation σ by the order n−1/2 , σ̂ − σ = O(n−1/2 ). Thus, P (S ∗ ≤ x) − P (S ≤ x) = O(n−1/2 ). (5.36) 5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY 39 Equations (5.35) and (5.36) outline bootstrap CDF accuracy for S and T respectively. Therefore the bootstrap estimate of S has the same order of accuracy as Normal approximation whereas bootstrap estimate of T is more accurate than the Normal approximation by the order n−1/2 . This brings us to the advantage of pivotal statistics. T is a pivotal statistic while S is not. The distribution of T does not depend on any unknown values, and the bootstrap power is directed toward estimating distribution skewness while in the case of non-pivotal statistic S, bootstrap power is “wasted” toward estimating standard deviation. At this stage the problem of interest is to study the accuracy of bootstrap confidence interval. 5.3 Bootstrap Confidence Interval Accuracy We recall that the Edgeworth expansion of the CDF of statistic Sn can be written as in equation (5.25). Denote the α-level percentile of Sn and standard Normal distribution by ξα , and zα respectively, ξα = inf{x : P (Sn ≤ x) ≥ α}. (5.37) By taking the inverse from both sides of equation (5.25), we can write the series expansion of ξα in terms of zα as, ξα = zα + n−1/2 P1cf (zα ) + ... + n−j/2 Pjcf (zα ) + O(n−j/2 ). (5.38) This expansion is referred to as Cornish-Fisher expansion. Cornish and Fisher [9] proved that the asymptotic series converges uniformly in ǫ < α < 1−ǫ for 0 < ǫ < 1/2. 5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY 40 In the expansion, Pjcf (x)’s are polynomials of degree at most j +1 with opposite parity to j with coefficients that depend on cumulants up to order j + 2. In particular, P1cf (x) = −P1 (x), 1 P2cf (x) = P1 (x)P1′ (x) − xP12 (x) − P2 (x). 2 (5.39) (5.40) In this section we consider two types of α-confidence intervals: one-sided and two-sided confidence intervals that we denote by I1 and I2 respectively, I1 = (−∞, θ̂ + n−1/2 σ̂zα ) (5.41) I2 = (θ̂ − n−1/2 σ̂xα , θ̂ + n−1/2 σ̂xα ), (5.42) where zα and xα are defined as, P (N ≤ zα ) = α, (5.43) P (|N| ≤ xα ) = α, (5.44) for a standard Normal random variable N. Essentially, I1 and I2 are constructed under the assumption of a Normal distribution for the population. From the Edgeworth expansion of statistic T in equation (5.29) we evaluate the accuracy of such an assumption, that is we calculate the coverage error of I1 , P (θ ∈ I1 ) = P (T > −zα ) = 1 − {Φ(−zα ) + n−1/2 Q1 (−zα )φ(−zα ) + O(n−1 )} = α − n−1/2 Q1 (−zα )φ(−zα ) + O(n−1 ). 5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY 41 Therefore, coverage error of the one-sided interval is in order n−1/2 . Similarly, by noting that Q1 (x) and Q2 (x) are even and odd polynomials respectively, we calculate the coverage error of I2 , P (θ ∈ I2 ) = P (T > −xα ) − P (T < −xα ) = α + 2n−1 Q2 (xα )φ(xα ) + O(n−2), which indicates that the coverage error of the two-sided interval is in order n−1 . We study the accuracy of bootstrap percentile estimates for pivotal statistic T . Let us define the α-percentile of T as ηα , P (T ≤ ηα ) = α. (5.45) Similarly, for the bootstrap estimate of T which is denoted by T ∗ in equation (5.31) α-percentile η̂α is defined as, P (T ∗ ≤ η̂α |χ) = α. (5.46) From Cornish-Fisher expansion ηα is written as, −1 cf −1 ηα = zα + n−1/2 Qcf 1 (zα ) + n Q2 (zα ) + O(n ), (5.47) 5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY 42 cf where zα is the α-percentile of the standard Normal distribution, and Qcf 1 and Q2 are defined as, Qcf 1 (x) = −Q1 (x), 1 2 ′ Qcf 2 (x) = Q1 (x)Q1 (x) − xQ1 (x) − Q2 (x), 2 (5.48) (5.49) where Q1 (x) and Q2 (x) are the polynomials in Edgeworth expansion of CDF of T in cf equation (5.29). By substituting coefficients in Qcf 1 (x) and Q2 (x) with their respec- tive bootstrap estimates we obtain Cornish-Fisher expansion of η̂α , bcf (zα ) + n−1 Q bcf (zα ) + O(n−1 ). η̂α = zα + n−1/2 Q 1 2 (5.50) bcf (x) differ from their corresponding values in Qcf (x) The estimates of coefficients in Q j j by order n−1/2 . Therefore, η̂α = ηα + O(n−1 ), (5.51) that is the order of accuracy of the bootstrap quantile estimate is n−1 which outperforms the order n−1/2 of Normal approximation in equation (5.47). So far we have studied bootstrap theory and investigated permutation analysis as a hypothesis testing method based on bootstrap principle. Moreover, we studied asymptotic behaviour of bootstrap as well as the accuracy of bootstrap estimates, and showed that bootstrap resampling from data outperforms Normal approximation of data distribution both in accuracy and asymptotic convergence. In the next chapter we follow the example that we pointed out briefly in section (3.3) to use permutation analysis to localize statistically significant brain activity in reaching/pointing tasks. 43 Chapter 6 Results Recent developments in brain imaging such as electroencephalography (EEG), magnetoencephalography (MEG), positron emission tomography (PET), and functional magnetic resonance imaging (fMRI) have made it possible for researchers to localize brain activity more accurately. Ideal localization of activity is spatially accurate and preserves the timing of the activity. However, in choosing an appropriate neuroimaging technique, there has always been a trade off between time and spatial resolution. For instance, fMRI works by detecting variations in levels of blood oxygenation that occur when blood flow increases in active brain areas due to more oxygen demand. Because of the time lag between neural activity and detectable blood oxygenation level, fMRI image time resolution is low (on the order of seconds). Therefore, while fMRI spatial accuracy is in millimetre scale which is high enough resolution in most studies, time resolution of the activity is compromised making it unsuitable for studies in which timing plays a crucial role. On the other hand, EEG captures timing of the activity more accurately by measuring the natural electrical current flow of active neurons. However, spatial localization of activity cannot be performed accurately because of the artifacts of non-cerebral origin such as eye movements and cardiac 44 artifacts that interfere with EEG data [22]. The neural imaging method of choice in this study is MEG. The MEG machine uses more than hundred highly sensitive super conductor magnetic sensors, referred to as superconducting quantum interference device (SQUID), on the scalp to measure magnetic fields that are produced radially by neuronal electric currents. MEG time resolution is on the order of milliseconds making the method appropriate for real time brain functional studies. Since the magnetic permeability of the scalp and underneath tissues is approximately the same, a neuronal magnetic signal can be measured without much distortion which is an advantage over EEG where largely varying conductances of these tissues distort measurements. Reliable estimates of such conductances must be available in EEG experiments to make up for the distortion [17]. Localization via recorded magnetic signals in MEG poses an inverse problem due to the fact that the number of sources in the brain from which activity is recorded is more than the number of measuring sensors. A number of post processing techniques such as spatial filtering (beamforming) have been proposed in the literature to solve the problem [23, 27, 3, 25, 26, 7, 8]. Moreover, classical and adaptive clustering algorithms are proposed to improve beamformer spatial resolution [12, 1] In this report we take the following localization approach: After appropriate preprocessing steps to prepare the data, we discretize the brain into 3mm3 voxels, and for each voxel compute the source activity in 7-35Hz frequency band using an eventrelated beamformer [8]. For each participant in the MEG experiment we construct a 3D brain image by registering voxel locations from his/her corresponding MEG system 6.1. METHODS 45 coordinate to the standardized Talairach coordinates [17] using an affine transformation (SPM2) (more detail in [1]). An average activation pattern can be calculated across all the resulted images (one image per subject). However, if the resulted brain activation patterns are largely varied across images, average activation pattern will not be informative enough to find active brain areas consistently across subjects. Therefore, we propose permutation analysis to find significantly active brain areas in the resulted images. This chapter is organized as follows: in section I we describe the experiment setup and methodology as well as post processing methods to make the data ready for permutation analysis. Finally, section II presents permutation analysis methodology and results. 6.1 Methods Participants Ten healthy adult participants (8 males, 2 females) age range 22-45 years with no history of neurological dysfunction or injury participated in this study. This study was approved both by the York University and Hospital for Sick Children Ethics Board. All participants gave informed consent. 6.1.1 Experimental Paradigm Figure 6.1 shows the experiment setup. Participants sat upright with their head under the dewar of the MEG machine in a electromagnetically shielded room during the experiment (Figure 6.1(d)). In each trial subjects performed a memory guided reaching task while they remained fixated on a central white cross. After a fixation 6.1. METHODS 46 for 500ms a green or red dot (target) was briefly presented for 200ms randomly either to the right or the left of the centre cross (Figure 6.1(a,c)). We refer to the 500ms interval before the target onset as baseline period. The centre cross dimmed after 1500ms as an instructor for subjects to start pointing toward (pro) or to the mirror opposite location of the target (anti) while the eyes remain fixated. Direction of pointing depended on the colour of the target, e.g. green and red represented pro and anti trials respectively. Pointing movement was wrist-only with three different wrist/forearm postures for right hand (pronation, upright, and down) and one posture for left hand (pronation) in separate blocks of trials (Figure 6.1(b)). Each pointing trial lasted approximately 3 seconds with a 500ms inter-trial interval (ITI). 100 trials for each condition-left hand versus right hand, pro versus anti, and 3 hand posturesamount to 1200 trials for each subject. Movement onset for each subject is measured using bipolar differential electromyography (EMG). For more detail of the experiment and MEG data acquisition procedure please refer to [1]. 6.1.2 Data Processing Data were collected at a rate of 600 samples per second with a 150Hz low pass filter, using synthetic third-order gradiometer noise cancellation. After manually inspection for artifacts in addition to eye movements, blinks, premature hand movements and removing corresponding trials from the analysis, on average 98 reaching trials per condition were retained for each subject for subsequent processing. Brain source activity is estimated from sensor data using event-related beamforming [8]. The idea of beamforming is depicted in Figure 6.2. The brain is discretized into voxels of volume 3mm3 for each subject. Beamformer assumes a dipole direction 6.1. METHODS 47 Figure 6.1: The MEG experiment setup. (a) Time course of the experiment. (b) Three postures of the hand were used in different recording blocks. (c) The fixation cross in the middle with two possible target locations in its left and right hand side. (d) Subjects sit upright under MEG machine performing the pointing task with the wrist only. (e) Task: target (cue) appears in either green or red to inform the subject of the pro or anti nature of the pointing trials. Dimming of the central fixation cross was the movement instruction for subjects. at each voxel location. Using the dipole forward solution and sensor covariance matrix it then solved for a dipole direction that minimizes power variance at the voxel. Dipole direction at each voxel can be regarded as a spatial filter weight that when applied to sensor data reconstructs instantaneous power at the corresponding voxel location by rejecting interfering power of adjacent voxels. 6.2. PERMUTATION ANALYSIS RESULTS 48 (A) Calculation of source activity over time (“virtual sensors”) Virtual sensor for source j Single trial data M channels T trials MXM covariance matrix trial 1 compute normalized beamformer weights Wn for source j trial 2 trial 3 forward solution for dipole j(r,u) trial T pseudo-Z Time (s) Average N samples (B) Imaging instantaneous source amplitude (“event-related” beamformer) Thresholded source image superimposed on MRI Compute average virtual sensors for 3-dimensional grid covering entire brain volume Map absolute amplitude at latency t P(t) = Wnm(t) 6.0 pseudo-Z 2.0 n voxels Figure 6.2: The diagram of the event-related beamformer [8]: The data consists of T trials each with M channels and N time samples. The covariance matrix of the data are given to the beamformer as well as the forward solution for dipole at each location. Average source activity is then estimated at each voxel, and dipole orientation is adjusted accordingly to maximize power at the corresponding voxel. 6.2 Permutation Analysis Results Brain activity is studied in separate frequency bands. In neuroscience terminology 7-15Hz, 15-35Hz, 35-55Hz, and 55-120Hz are referred to as alpha, beta, lower-gamma, and higher-gamma bands respectively [17]. In this study, the frequency band of focus is 7-35Hz, that is alpha and beta bands put together. Thus, beamformer estimated power time series at each voxel location is bandpass filtered to retain the power in 7-35Hz band. Data at each voxel is aligned at the cue onset when target appears, and around 6.2. PERMUTATION ANALYSIS RESULTS 49 the movement onset when subject starts the movement according to the EMG measurement. Moreover data samples are transformed into Z-scores by subtracting each sample from the baseline mean and normalize by standard deviation. In [1] a two-level adaptive cluster analysis is proposed to find active brain areas in the experiment. It is shown that a large network of brain areas are involved in reaching task starting from visual areas in occipital lobe and continuing to parietal areas that are presumably responsible for sensorimotor transformations, and finally movement planning and execution in primary motor cortex. Figure 6.3 shows average brain z-score power activity from 0.45 seconds before movement onset to movement onset averaged across all subjects for right hand movement toward left targets (pro-left condition). Activity is shown in three planes: (a) transverse, (b) sagittal, and (c) coronal. As the figure shows, a network of brain areas are active with positive activations (neuronal synchronization) in occipital visual areas (e.g. V3)and parietal areas, and negative activity (neuronal desynchronization) in contralateral primary motor cortex (M1) that executes the movement. Figure 6.4 shows average brain z-score power around cue onset from target appearance to 0.5 seconds after, averaged across all subjects for movement toward left targets (pro-left). Figure shows contralateral desynchronization in primary visual areas followed by activations in parietal areas such as mid-posterior intra-parietal sulcus (mIPS), angular gyrus (AG), inferior parietal lobule (IPL), and superior parietal lobule (SPL). Figure 6.5 shows average brain z-score power activity from 0.45 seconds before movement onset to movement onset averaged across all subjects for right hand movement to mirror opposite direction of right targets (anti-right condition). This is 6.2. PERMUTATION ANALYSIS RESULTS Pro-left mov-righthand M1 (b) 50 V3 (c) Psuedo-Z 3.37 1 -1 -2.66 (a) Figure 6.3: Average brain activation for pro condition/left target around movement onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. essentially the same movement as pro-left condition except that the target appears on the right. The reason to include such anti conditions in the experiment was to detach movement from target stimuli in order to be able to investigate parietal brain areas that are responsible for sensorimotor transformation from retinal to shoulder coordinates to execute an accurate movement plan [4]. As it can be seen from the figure activation pattern around movement onset is the same as that of pro-left condition in Figure 6.3. Pre-motor ventral (PMV) area is also shown in the figure. Figure 6.6 shows average brain z-score power around cue onset from target appearance to 0.5 seconds after, averaged across all subjects for anti movement/right target (anti-right). Figure shows desynchronization in contralateral in mIPS and AG. So far we have looked at average activation patterns which denote by X̄ for the 6.2. PERMUTATION ANALYSIS RESULTS 51 Pro-left cue-righthand SPL mIPS IPL AG (b) (c) Psuedo-Z 0.91 0.80 -0.80 -2.19 (a) Figure 6.4: Average brain activation for pro condition/left target around cue onset (0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. following argument. As it is evident from the figures, average activation patterns appear in wide pseudo-z ranges. Thus, it is important to investigate what areas are significantly active among all subjects, that is to set pseudo-z score thresholds above which average patterns are statistically significant. To this end, we formulate a hypothesis testing problem with null hypothesis as, H : X̄ = 0 (6.1) K : X̄ 6= 0 where H and K are the null and alternative hypotheses respectively, and X̄ is the average activation pattern. We limit the probability of type I error by α = 0.05 to calculate 95% confidence 6.2. PERMUTATION ANALYSIS RESULTS 52 Anti-right mov-righthand (b) (c) PMV (a) Psuedo-Z 3.05 1 -1 -2.90 Figure 6.5: Average brain activation for anti condition/right target around movement onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. intervals and p-values, and solve the hypothesis testing problem at each brain voxel individually. We do not know the distribution of data and sample mean to solve the hypothesis problem analytically. Therefore, we estimate the mean distribution using bootstrap procedure and use permutation to solve the problem. Bootstrapping procedure involves resampling with replacement from the data, and calculate the mean for each resample to estimate cumulative distribution function of the mean. As mentioned in section 3.3 in permutation analysis resampling is performed consistent with the null hypothesis. If we assume that the null hypothesis H is true then the mean of power signal at each voxel is zero which is equal to the mean of the inverted version of it. Therefore, for each condition, say pro-left around movement onset, for every subject at each brain voxel we have 3 signals corresponding 6.2. PERMUTATION ANALYSIS RESULTS 53 Anti-right cue-righthand mIPS AG (b) (c) Psuedo-Z 1.18 (a) 1 -1 -2.23 Figure 6.6: Average brain activation for anti condition/right target around cue onset (0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. to the 3 wrist-postures and their inverted versions which amount to a resampling dataset of size 60 for all subjects at each brain voxel. Notice that the original sample mean consists of 30 signals without considering their inverted counterparts. Moreover, by including all the postures in the resampling dataset as well as the original mean we are ignoring posture effects in the signals. Empirical cumulative distribution of the mean is estimated by taking 2048 resamples with replacement from the resampling dataset. If the sample mean is greater than 95%-bootstrap percentile then we reject the null hypothesis. Figure 6.7 shows the permutation analysis result for pro-left condition around movement onset. The left panel and right panel show negative and positive activity respectively. Negative and positive threshold for 95% p-values are shown in the figure 6.2. PERMUTATION ANALYSIS RESULTS Maximum value=2.66 Absolute threshold=2.23 54 Maximum value=3.37 Absolute threshold=2.26 Figure 6.7: Permutation analysis for pro condition/left target around movement onset in three planes with 95% p-values. Right panel: positive activity (synchronization), Left panel: negative activity (desynchronization) as well. As is evident from the figure activity in M1 and V3 are statistically significant (Figure 6.3). Figure 6.8 shows the permutation analysis result for pro-left condition around cue onset. The null hypothesis is not rejected for positive activation, and negative significant activity with the corresponding threshold is shown in the figure. As the figure shows only mean negative activation in Figure 6.4 is significant. Figure 6.9 shows the permutation analysis result for anti-right condition around movement onset. The left panel and right panel show negative and positive activity respectively. Negative and positive threshold for 95% p-values are shown in the figure as well. Figure 6.10 shows the permutation analysis result for anti-right condition around cue onset. The null hypothesis is not rejected for positive activation, and negative significant activity with the corresponding threshold is shown in the figure. As the figure shows only negative activation in mIPS is significant. 6.2. PERMUTATION ANALYSIS RESULTS 55 Maximum value=2.19 Absolute threshold=1.36 Figure 6.8: Permutation analysis for pro condition/left target around cue onset in three planes with 95% p-values. Null hypothesis is not rejected for positive activation. Negative 95% significant activation is shown. Maximum value=2.90 Absolute threshold=2.09 Maximum value=3.05 Absolute threshold=2.13 Figure 6.9: Permutation analysis for anti condition/right target around movement onset in three planes with 95% p-values. Right panel: positive activity (synchronization), Left panel: negative activity (desynchronization) 6.2. PERMUTATION ANALYSIS RESULTS 56 Maximum value=2.23 Absolute threshold=1.62 Figure 6.10: Permutation analysis for anti condition/right target around cue onset in three planes with 95% p-values. Null hypothesis is not rejected for positive activation. Negative 95% significant activation is shown. 57 Chapter 7 Conclusion In this report we have studied the theory of bootstrap. Bootstrap procedure is widely used recently due to computational advances that have made the implementation possible. We studied bootstrap procedure mathematical theory, and construction of bootstrap confidence interval. We covered hypothesis testing as well as Neyman-Pearson lemma as an important theorem in hypothesis testing which provides a necessary and sufficient condition for the uniform most powerful test for a wide range of hypothesis testing problems. Permutation analysis is also investigated as a method of using resampling idea of the bootstrap procedure in solving hypothesis testing problems. We investigated the asymptotic properties of sample mean bootstrap distribution estimate, and showed that it is asymptotically Normal. Next, we studied bootstrap estimate accuracy order using the tools from Edgeworth and Cornish-Fisher expansions. The idea of resampling with replacement from data to derive an estimate for statistic under study might look like a stretch at first sight. Interestingly, the method not only converges to the true value of the statistic 58 asymptotically but also, provides a more accurate estimate than the Normal approximation. Furthermore, confidence interval estimate accuracy is improved considerably. Finally, we applied permutation analysis on a database of brain magnetic signals that is collected in an MEG reaching experiment to locate brain areas involved in reaching. We showed that permutation analysis provides a statistically sound framework to derive a significance threshold in brain images specially when there is large variability in the data that makes average activity more difficult to read. Further investigation into the role of the areas is called for. One idea is to look at timefrequency responses at each voxel in which frequency axis covers 7-120Hz range and time axis covers the duration of the experiment. Time-frequency response in a brain region helps us study activation time series in different frequencies around cue onset and movement onset. Furthermore, comparing such patterns between right and left regions helps us to find specific time points that activity flips from positive to negative and vice versa between the two corresponding regions which might direct more specifically to the role of such regions during the experiment. BIBLIOGRAPHY 59 Bibliography [1] H. Alikhanian, J. D. Crawford, J. F. Desouza, D. Cheyne, and G. Blohm. Adaptive cluster analysis approach for functional localization using magnetoencephalography. Front Neurosci., 7:73, 2013. [2] J. E. Angus. A note on the central limit theorem for the bootstrap mean. Communications in Statistics-Theory and Methods, 18(5):1979–1982, 1989. [3] G. Barbati, C. Porcaro, F. Zappasodi, P. M. Rossini, and F. Tecchio. Optimization of an independent component analysis approach for artifact identification and removal in magnetoencephalographic signals. Clinical Neurophysiology, 115(5):1220–1232, 2004. [4] S. M. Beurze, I. Toni, L. Pisella, and W. P. Medendorp. Reference frames for reach planning in human parietofrontal cortex. Journal of Neurophysiology, 104:1736–1745, 2010. [5] R. N. Bhattacharya and J. K. Ghosh. On the validity of the formal edgeworth expansion. Ann. Statist., 6(2):239–472, 1978. [6] P. J. Bickel and D. A. Freedman. Some asymptotic theory for the bootstrap. Ann. Statist., 9(6):1196–1217, 1981. BIBLIOGRAPHY 60 [7] D. Cheyne, L. Bakhtazad, and W. Gaetz. Spatiotemporal mapping of cortical activity accompanying voluntary movements using an event-related beamforming approach. Hum. Brain Mapp., 48:213–229, 2006. [8] D. Cheyne, A. C. Bostan, W. Gaetz, and E. W. Pang. Event-related beamforming: A robust method for presurgical functional mapping using meg. Clinical Neurophysiology, 118(8):1691–1704, 2007. [9] E. A. Cornish and R. A. Fisher. Moments and cumulants in the specification of distributions. Revue de lInstitut Internat. de Statistique, 5:307–322, 1938. [10] H. Cramer. On the composition of elementary errors. Skand. Aktuarietidskr, (1):141–180, 1928. [11] B. Efron. Bootstrap methods: Another look at the jackknife. Ann. Statist., 7(1):1–26, 1979. [12] J. R. Gilbert, L. R. Shapiro, and G. R. Barnes. A peak-clustering method for meg group analysis to minimise artefacts due to smoothness. PLoS ONE, 7(9):e45084, 2012. [13] P. Golland and B. Fischl. Permutation tests for classification: Towards statistical significance in image-based studies. Inf Process Med Imaging, 18:330–341, 2003. [14] P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer, 2005. [15] Peter Hall. On the bootstrap and confidence intervals. Ann. Statist., 14(4):1431– 1452, 1986. BIBLIOGRAPHY 61 [16] Peter Hall. The Bootstrap and Edgeworth Expansion. Springer, 1995. [17] P. Hansen, M. Kringelbach, and R. Salmelin. MEG: An Introduction to Methods. Oxford University Press, 2010. [18] T. Hesterberg, D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein. Bootstrap Methods and Permutation Tests. Introduction to the Practice of Statistics, Freeman, New York, 2005. [19] S. Jun and T. Dongsheng. The Jackknife and Bootstrap. Springer, 1995. [20] E. L. Lehmann. Some principles of the theory of testing hypotheses. The Annals of Mathematical Statistics, 21(1):1–26, 1950. [21] E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 2005. [22] O. G. Lins, T. W. Picton, P. Berg, and M. Scherg. Ocular artifacts in eeg and event-related potentials i: Scalp topography. Brain Topography, 6(1):51–63, 1993. [23] W. S. Merrifield, P. G. Simos, A. C. Papanicolaou, L. M. Philpott, and W. W. Sutherling. Hemispheric language dominance in magnetoencephalography: Sensitivity, specificity, and data reduction techniques. Epilepsy & amp: Behavior, 10(1):120–128, 2007. [24] D. N. Politis and J. P. Romano. Limit theorems for weakly dependent hilbert space valued random variables with application to the stationary bootstrap. Statistica Sinica, 14:461–476, 1994. BIBLIOGRAPHY 62 [25] F. Rong and J. L. Contreras-Vidal. Magnetoencephalographic artifact identification and automatic removal based on independent component analysis and categorization approaches. Journal of Neuroscience Methods, 157(2):337–354, 2006. [26] K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita. Reconstructing spatio-temporal activities of neural sources using an meg vector beamformer technique. IEEE Trans Biomed Eng., 48(7):760–771, 2001. [27] S. Taulu and J. Simola. Spatiotemporal signal space separation method for rejecting nearby interference in meg measurements. Physics in Medicine and Biology, 51(7):1759–1768, 2006.