Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Sample Location Problems Ba Chu E-mail: ba [email protected] Web: http://www.carleton.ca/∼bchu (Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for details. Examples will be given and explained in the class.) 1 Objectives In the last lecture, I have introduced the methods for drawing statistical inferences about a single population mean. These methods were explicitly based on the assumption that the sample was large enough to justify the use of the CLT for normal approximations. However, when the sample size is not large enough or the data are not symmetrically distributed, statisticians are interested in not only the population mean but also other measures of centrality such as the population median. For this reason, the purpose of this lecture is to introduce methods for drawing inferences about the population medians, which we call sample location problems. Specifically, I will explicate (1) 1-sample location problem and (2) 2-sample location problem. 2 2.1 1-Sample Location Problem Motivating example It has been decided that the grades of ECON 4002 have the population median of 85. To determine if this is true, a sample of 40 students is drawn, and the grade of each student is recorded. Let Xi denote the grade of student i. Then X1 , . . . , X40 are a sample drawn from an asymmetric 1 distribution, and we are interested in testing whether or not the population median is equal to 85, i.e., H0 : q2 (X) = 85 vs. H1 : q2 (X) 6= 85. Before studying general cases, we start with a simple case when the data are assumed to be normally distributed. In this case, the population mean and the population median are the same. 2.2 Hypothesis testing with normally distributed data Suppose Xi ∼ N (µ, σ 2 ), where σ 2 may be unknown. We are interested in these hypotheses H0 : µ = µ0 and H1 : µ 6= µ0 . The intuition underlying testing H0 vs. H1 was discussed in the last lecture: • If H0 is true, the we would expect the sample mean to be close to the population mean µ0 . • Hence, if the sample average xn is far from µ0 , then we are inclined to reject H0 . More precisely, we reject H0 if and only if the significance prob. p = PH0 (|X n −µ0 | ≥ |xn −µ0 |) ≤ α., where α is any real number in (0, 1) chosen by the statistician. Note that α is usually chosen to be 0.05. We have learnt to approximate p by using the CLT when we did not know the distribution of Xi , but for large sample sizes. However, if we know that X1 , . . . , Xn are normally distributed, then it turns out that we can compute p exactly, even when n is small. The population variance σ 2 is known 2.2.1 2 Under the null hypothesis that µ = µ0 , we have Xi ∼ N (µ0 , σ 2 ), and X n ∼ N (µ0 , σn ). Hence, Z = X n −µ √ 0 σ/ n ∼ N (0, 1). The observed value of Z is z = xn −µ √ 0. σ/ n The significance prob. is p = PH0 (|Z| ≥ |z|) = 2(1 − Φ(|z|)). The test that rejects H0 is p ≤ α is called the 1-sample test. 2.2.2 The population variance σ 2 is unknown We can replace σ 2 with an estimator, say Sn = statistic Tn = let tn = X n −µ √0. Sn / n xn −µ √0 , Sn / n 1 n−1 Pn 1 (Xi − X n )2 . In this case, we obtain the test It has been proved that, under the null hypothesis µ = µ0 , Tn ∼ t(n−1). Now the significance prob. is p = PH0 (|Tn | ≥ |tn |) = 2PH0 (Tn ≥ |tn |) = 2(1 − Ft (|tn |)), where Ft is the cdf of the t random variable. Example 1. Suppose that, to test H0 : µ = 0 vs. H1 : µ 6= 0 (a 2-sided alternative), we draw a sample of size n = 25 and observe x = 1 and s = 3. Then t = 5/3 and the 2-tailed 2 significance prob. is computed using both tails of the t(24) distribution (following this link: http: // www. danielsoper. com/ statcalc/ calc08. aspx ), i.e., p = 0.1086 Example 2. Suppose that, to test H0 : µ ≤ 0 vs. H1 : µ > 0 (a 1-sided alternative), we draw a sample of size n = 25 and observe xn = 2 and s = 5. Then t = 2 and the 1-tailed significance prob. is computed using one tail of t(24) distribution, i.e., p = 0.0285. 2.3 Hypothesis testing with asymmetrically distributed data Now we assume that Xi has an asymmetric probability distribution. In this case, we shall focus on studying the population median q2 (X). The population median, denoted by θ, is a robust measure of centrality vis-a-vis the population mean because the former is always exists and is not sensitive to the influence of outliers. Consider a 2-sided alternative, H0 : θ = θ0 vs. H1 : θ 6= θ0 . We will explain a testing procedure, namely the sign test. The intuition underlying the sign test is pretty straight-forward. Under H0 , the population median is θ = θ0 , then when we sample from P , we should observe roughly half the xi above θ0 and half the xi below θ0 . Hence, if we observe proportions of xi above/below θ0 that are very different from one half, then we are inclined to reject H0 . More formally, we let p+ = PH0 (Xi > θ0 ) and p− = PH0 (Xi < θ0 ). Therefore, under H0 , p+ = p− = 0.5. The sign test is implemented as follows: 1. Let Y denote the number of Xi greater than θ0 (in math. notation, Y = #(Xi − θ0 > 0)). Under H0 , Y ∼ Binomial(n, 0.5). The observed value of Y is y = #(xi − θ0 > 0). 2. The significance prob. of the test statistic Y − E[Y ], where E[Y ] = n2 , is p = PH0 (|Y − n2 | > |y − n2 |). We reject H0 if p ≤ α. 3. The significance prob. is computed as p = 2FBin (c), where FBin is the cdf of the binomial random variable and c = min(y, n − y). Example 3. Now suppose that we want to test H0 : θ = 100 vs. H0 : θ 6= 100 at the significance level α = 0.05, having observed the sample {98.73, 97.17, 100.17, 101.26, 94.47, 96.39, 99.67, 97.77, 97.46 3 , 97.41}. Here, n = 10, y = #(xi > 100), and c = min(2, 10 − 2) = 2, so p = 0.109375 > 0.05 (for a Binomial prob. calculator, follow the link: http: // joemath. com/ binomial ). We accept H0 . 3 2-Sample Location Problem The section is concerned with comparing two populations with respect to some measure of centrality, typically the population mean or the population median. We make the following assumptions: • There are two mutually independent samples, say X1 , . . . , Xn1 ∼ P1 and Y1 , . . . , Yn2 ∼ P2 , where P1 and P2 are continuous probability distributions. • P1 has location parameter, θ1 , and P2 has location parameter, θ2 . We shall compare population means, θ1 = E[Xi ] and θ2 = E[Yi ], or population medians, θ1 = E[Xi ] and θ2 = E[Yi ]. The shift parameter ∆ = θ1 − θ2 measures the difference in population location. • We observe random samples {x1 , . . . , xn1 } and {y1 , . . . , yn2 } from which we attempt to draw inferences about ∆. Notice that we do not assume that n1 = n2 . We begin this section by considering an example. 3.1 Motivating example Roundabouts (or traffic circles) are supposed to increase traffic flow. To determine if they do, n1 + n2 drivers are recruited to participate in a hypothetical double-blind study. The drivers are randomly assigned to a group of n1 drivers who drive from a location, A, to another location ,B, via a roundabout and a group of n2 drivers who drive from A to B via a cross junction with STOP signs. Assuming that those two groups of drivers face similar traffic conditions and drive the same number of miles. All the drivers move off at the same time, each driver’s arrival time at B is recorded. Neither the driver nor the traffic controller knows to which group the driver was assigned. For this experiment, let Xi denote the driving time (in minutes) of driver i in the first group, and let Yj denote the driving time of driver j in the second group. Then (X1 , . . . , Xn1 ) ∼ P1 , (Y1 , . . . , Yn2 ) ∼ P2 , and we are interested in drawing inferences about ∆ = θ1 − θ2 . Note that ∆ > 0 4 if the driving time is greater for the first group than for the second group. Thus, to produce a compelling evidence of the roundabouts’ efficiency, we may test H0 : ∆ < 0 vs. H1 : ∆ > 0. 3.2 The Normal 2-sample location problem We assume that P1 = N (µ1 , σ12 ) and P2 = N (µ2 , σ22 ). Let ∆ = µ1 − µ2 and ∆ = X n1 − Y n2 . Recalling that hypothesis testing for a single population mean were based on knowing the distribution of the standardized natural estimator. In summary, for σ 2 known, we learnt that X −µ ∼ Z=p σ 2 /n ∼ N (0, 1) if X1 , . . . , Xn ∼ N (µ0 , σ 2 ), N (0, 1) if n large. For σ 2 unknown, we learnt that X −µ ∼ Z=p S 2 /n ∼ t(n − 1) if X1 , . . . , Xn ∼ N (µ0 , σ 2 ), N (0, 1) if n large. The logic for drawing inferences about two population means is identical to the logic for drawing inferences about one population mean – we can base our inference about ∆ on the distribution of ∆−∆ . standard error σ2 σ2 Because Xi ∼ N (µ1 , σ12 ) and Yj ∼ N (µ2 , σ22 ), we have: X ∼ N (µ1 , n11 ) and Y ∼ N (µ2 , n22 ). It σ2 follows that ∆ = X − Y ∼ N (µ1 − µ2 , n11 + | {z } σ22 ). n2 ∆ 3.2.1 The population variances are known If ∆ = ∆0 , then Z = for ∆0 , (∆ − q0.975 q r∆−∆0 2 σ1 σ2 + n2 n1 2 σ12 n1 + σ22 n2 ∼ N (0, 1). We can immediately construct the 95% confidence interval < ∆ < ∆ − q0.975 q σ12 n1 + σ22 ), n2 where q0.975 = 1.96 is the 975% quantile of the standard normal distribution. Example 4. For the first population, we know that σ1 = 5 and that we observe a sample of size n1 = 60 with sample average, x = 7.6. For the second population, we know that σ2 = 2.5 and 5 that we observe a sample of size n2 = 15 with sample average, y = 5.2. Then, the 0.95 confidence interval for ∆0 is (0.61, 4.21). To test H0 : ∆ = ∆0 vs. H1 : ∆ 6= ∆0 , we use the fact that Z ∼ N (0, 1) under H0 . Let z denote the observed value of Z. We reject H0 if the significance prob. p = PH0 (|Z| > |z|) = 2(1 − Φ(|z|)) is less or equal to α (e.g., α = 0.05). Example 5. Continued from Example (4), to test H0 : ∆ = 0 vs. H1 : ∆ 6= 0, we compute z = √(7.6−5.2)−0 2 2 5 /60+2.5 /15 3.2.2 = 2.629. Since p = P (|Z| ≥ 2.629) = 0.00856 < 0.05, we reject H0 . The population variances are unknown When σ12 and σ22 are unknown, we replace them with their estimators, S12 and S22 respectively. We S2 now rely on T = r∆−∆0 2 S1 S2 + n2 n1 2 is given by (∆ − qt q S12 n1 + ∼ t(b ν ), where νb = S22 n2 < ∆ < ∆ + qt q S2 ( n1 + n2 )2 1 2 2 /n )2 2 /n )2 (S2 (S1 2 1 + n1 −1 n2 −1 S12 n1 + S22 ), n2 . The 0.95 confidence interval for ∆0 where qt,νb is the 0.975 quantile of the t(b ν) distribution. Example 6. Continued from Example (4), we estimate the unknown population variances separately, σ12 by s21 = 52 and σ22 = 2.52 . We can compute the estimated degree of freedom νb = 45.26. The 0.95 confidence interval for ∆ with qt,45.26 = 2.015 is given by (0.56, 4.24). To test H0 : ∆ = ∆0 vs. H1 : ∆ 6= ∆0 , we use T ∼ t(b ν ). Let t denote the observed value of T , we reject H0 if and only if p = PH0 (|T | ≥ |t|) = 2(1 − Ft,νb(|t|)), where Ft,νb is the cdf of the t distribution with νb degree of freedom., is less or equal to α. Example 7. Continued form Example (4), to test H0 : ∆ = 0 vs. H1 : ∆ 6= 0, we compute t = √(7.6−5.2)−0 2 2 5 /60+2.5 /15 = 2.629. The significance prob. is p = PH0 (|T | > 2.629) = 0.011655 < 0.05, thus we reject H0 at the significance level α = 0.05. 4 Exercises 1. The intelligence quotients (IQs) of 16 students from one area of a city showed a mean of 107 and a standard deviation of 10, while the IQs of 14 students from another area of the city 6 showed a mean of 112 and a standard deviation of 8. Is there a significant difference between the IQs of the two groups at significance levels (α) of (a) 0.01 and (b) 0.05? 2. If a variable X has a t distribution with ν = 10, find (a) P (X > 1.25), (b) P (−1.25 < X < 1.25), (c) P (X < 1.30), and (d) P (X ≥ 2.10). (Hint: use the prob. calculator: http://www.danielsoper.com/statcalc/calc08.aspx.) 3. On an examination in Statistics, 12 students in one class had a mean grade of 78 with a standard deviation of 6, while 15 students in another class had a mean grade of 74 with a standard deviation of 8. Using a significance level of 0.05, determine whether the first group is superior to the second group. 4. Mecocci et al. (2004) – link: http://www.ncbi.nlm.nih.gov/pubmed/15462460 – theorizes that living with a dog diminishes depression in the elderly, here defined as more than 70 years old. To investigate the theory, they recruits 15 single elderly mean who own dogs and 15 single elderly men who do not own any pets. The Hamilton instrument for measuring depressive tendency is administered to each subject. High scores indicate depression. How might Mecocci et al. (2004) use the resulting data to test his theory? Explicate a testing procedure. 5. The breaking strengths of cables produced by a manufacturer have a mean of 1800 pounds and a standard deviation of 100 pounds. By a new technique in the manufacturing process, it is claimed that the breaking strength can be increased. To test this claim, a sample of 50 cables is tested and it is found that the mean breaking strength is 1850 pounds. Can we support this claim at 0.05 significance level. 7