Download Lecture 18

Hypothesis Testing To define a statistical Test we 1. Choose a statistic (called the test statistic) 2. Divide the range of possible values for the test statistic into two parts • The Acceptance Region • The Critical Region To perform a statistical Test we 1. Collect the data. 2. Compute the value of the test statistic. 3. Make the Decision: • If the value of the test statistic is in the Acceptance Region we decide to accept H0 . • If the value of the test statistic is in the Critical Region we decide to reject H0 . The z-test for Proportions Testing the probability of success in a binomial experiment Situation • A success-failure experiment has been repeated n times • The probability of success p is unknown. We want to test – H0: p = p0 (some specified value of p) Against – HA: p  p0 The Test Statistic z pˆ  p0  pˆ  pˆ  p0 p0 1  p0  n The Acceptance and Critical Region • Accept H0 if:  z / 2  z  z / 2 • Reject H0 if: z   z / 2 or z  z / 2 Two-tailed critical region One-tailed critical regions These are used when the alternative hypothesis (HA) is one-sided i.e. H 0 : p  p0 and H A : p  p0 The Acceptance and Critical Region • Accept H0 if: z  z • Reject H0 if: z  z or if H 0 : p  p0 and H A : p  p0 • Accept H0 if:  z  z • Reject H0 if: z   z One-tailed critical regions H 0 : p  p0 and H A : p  p0 The Acceptance and Critical Region Accept H0 if: z  z , Reject H0 if: z  z One-tailed critical regions H 0 : p  p0 and H A : p  p0 The Acceptance and Critical Region Accept H0 if: z   z, Reject H0 if: z   z Comments • Whether you use a one-tailed or a two-tailed tests is determined by the choice of the alternative hypothesis HA • The alternative hypothesis, HA, is usually the research hypothesis. The hypothesis that the researcher is trying to “prove”. Examples 1. A person wants to determine if a coin should be accepted as being fair. Let p be the probability that a head is tossed. H 0 : p  12 vs H A : p  1 2 One is trying to determine if there is a difference (positive or negative) with the fair value of p. 2. A researcher is interested in determining if a new procedure is an improvement over the old procedure. The probability of success for the old procedure is p0 (known). The probability of success for the new procedure is p (unknown) . H 0 : p  p0 vs H A : p  p0 One is trying to determine if the new procedure is better (i.e. p > p0) . 2. A researcher is interested in determining if a new procedure is no longer worth considering. The probability of success for the old procedure is p0 (known). The probability of success for the new procedure is p (unknown) . H 0 : p  p0 vs H A : p  p0 One is trying to determine if the new procedure is definitely worse than the one presently being used (i.e. p < p0) . The z-test for the Mean of a Normal Population We want to test, m, denote the mean of a normal population The Situation • Let x1, x2, x3 , … , xn denote a sample from a normal population with mean m and standard deviation . • Let n x x i 1 n i  the sample mean • we want to test if the mean, m, is equal to some given value m0. • Obviously if the sample mean is close to m0 the Null Hypothesis should be accepted otherwise the null Hypothesis should be rejected. The Test Statistic z x  m0 x  x  m0  n x  m0  n  x  m0 n s  The Acceptance and Critical Region This depends on H0 and HA Two-tailed critical region H 0 : m  m0 and H A : m  m0 • Accept H0 if:  z / 2  z  z / 2 • Reject H0 if: z   z / 2 or z  z / 2 One-tailed critical regions H 0 : m  m0 and H A : m  m0 H 0 : m  m0 and H A : m  m0 • Accept H0 if: z  z • Accept H0 if: z   z • Reject H0 if: z  z • Reject H0 if: z   z Example A manufacturer Glucosamine capsules claims that each capsule contains on the average: • 500 mg of glucosamine To test this claim n = 40 capsules were selected and amount of glucosamine (X) measured in each capsule. Summary statistics: x  496.3 and s  8.5 We want to test: H 0 : m   Manufacturers claim is correct against H A : m   Manufacturers claim is not correct The Test Statistic z x  m0 x  x  m0   n x  m0  x  m0  n s n 496.3  500  40 8.5  2.75 The Critical Region and Acceptance Region Using  = 0.05 z/2 = z0.025 = 1.960 We accept H0 if -1.960 ≤ z ≤ 1.960 reject H0 if z < -1.960 or z > 1.960 The Decision Since z= -2.75 < -1.960 We reject H0 Conclude: the manufacturers’s claim is incorrect: “Students” t-test Recall: The z-test for means The Test Statistic z x  m0 x x  m0 x  m0    s n n Comments • The sampling distribution of this statistic is the standard Normal distribution • The replacement of  by s leaves this distribution unchanged only the sample size n is large. For small sample sizes: The sampling distribution of x  m0 t s n Is called “students” t distribution with n –1 degrees of freedom Properties of Student’s t distribution • Similar to Standard normal distribution – Symmetric – unimodal – Centred at zero • Larger spread about zero. – The reason for this is the increased variability introduced by replacing  by s. • As the sample size increases (degrees of freedom increases) the t distribution approaches the standard normal distribution 0.4 0.3 0.2 0.1 -4 -2 2 4 t distribution standard normal distribution The Situation • Let x1, x2, x3 , … , xn denote a sample from a normal population with mean m and standard deviation . Both m and  are unknown. • Let n x x i 1 n n s i  the sample mean  x  x  i 1 2 i n 1  the sample standard deviation • we want to test if the mean, m, is equal to some given value m0. The Test Statistic x  m0 t s n The sampling distribution of the test statistic is the t distribution with n-1 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : m  m0 t  t / 2 or t  t / 2 H A : m  m0 t  t H A : m  m0 t  t t and t/2 are critical values under the t distribution with n – 1 degrees of freedom Critical values for the t-distribution  or /2 0 t t / 2 or t Critical values for the t-distribution are provided in tables. A link to these tables are given with today’s lecture Look up  Look up df Note: the values tabled for df = ∞ are the same values for the standard normal distribution Example • Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases. • Assume that x1, x2, x3 , x4, x5, x6 is a sample from a normal population with mean m and standard deviation . Both m and  are unknown. • we want to test: H 0 : m  0 New diet is not effective versus HA : m  0 New diet is effective The Test Statistic x  m0 t s n The Critical region: Reject if t  t The Data 1 2.0 2 1.0 3 1.4 4 -1.8 5 0.9 6 2.3 The summary statistics: x  0.96667 and s  1.462418 The Test Statistic x  m0 0.96667  0 t   1.619 1.462418 s n 6 The Critical Region (using  = 0.05) Reject if t  t0.05  2.015 for 5 d.f. Conclusion: Accept H0: Confidence Intervals Confidence Intervals for the mean of a Normal Population, m, using the Standard Normal distribution x  z / 2  n Confidence Intervals for the mean of a Normal Population, m, using the t distribution x  t / 2 s n The Data 1 2.0 2 1.0 3 1.4 4 -1.8 5 0.9 6 2.3 The summary statistics: x  0.96667 and s  1.462418 Example • Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases. The Data: 1 2.0 2 1.0 3 1.4 4 -1.8 5 0.9 6 2.3 The summary statistics: x  0.96667 and s  1.462418 Confidence Intervals (use  = 0.05) x  t0.025 s n 1.462418 0.96667  2.571 6 0.96667 1.535  0.57 to 2.50 Comparing Populations Proportions and means Sums, Differences, Combinations of R.V.’s A linear combination of random variables, X, Y, . . . is a combination of the form: L = aX + bY + … where a, b, etc. are numbers – positive or negative. Most common: Sum = X + Y Difference = X – Y Simple Linear combination of X, bX + a Means of Linear Combinations If L = aX + bY + … The mean of L is: Mean(L) = a Mean(X) + b Mean(Y) + … Most common: Mean( X + Y) = Mean(X) + Mean(Y) Mean(X – Y) = Mean(X) – Mean(Y) Mean(bX + a) = bMean(X) + a Variances of Linear Combinations If X, Y, . . . are independent random variables and L = aX + bY + … then Variance(L) = a2 Variance(X) + b2 Variance(Y) + … Most common: Variance( X + Y) = Variance(X) + Variance(Y) Variance(X – Y) = Variance(X) + Variance(Y) Variance(bX + a) = b2Variance(X) Combining Independent Normal Random Variables If X, Y, . . . are independent normal random variables, then L = aX + bY + … is normally distributed. In particular: X + Y is normal with mean m X  mY standard deviation  X2   Y2 X – Y is normal with mean m X  mY standard deviation  X2   Y2 Comparing proportions Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions We want to test either: 1. H 0 : p1  p2 vs H A : p1  p2 or 2. H 0 : p1  p2 vs H A : p1  p2 or 3. H 0 : p1  p2 vs H A : p1  p2 The test statistic: z pˆ1  pˆ 2  pˆ  pˆ 1 2  pˆ1  pˆ 2 pˆ1 1  pˆ1  pˆ1 1  pˆ1   n1 n1 Where: A sample of n1 is selected from population 1 resulting in x1 successes A sample of n2 is selected from population 2 resulting in x2 successes x1 pˆ1  n1 and x2 pˆ 2  n2  pˆ  Logic: 1 p1 1  p1  n1  pˆ  pˆ     1 2 pˆ1 2   pˆ  2 p2 1  p2  n1 2 pˆ 2 p1 1  p1  p2 1  p2   n1 n2 1 1  p1  p    if p1  p2  p  n1 n2  1 1  pˆ 1  pˆ     n1 n2  The Alternative Hypothesis HA The Critical Region H A : p1  p2 z   z / 2 or z  z / 2 H A : p1  p2 z  z H A : p1  p2 z   z Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers We want to test: H 0 : p1  p2 vs H A : p1  p2 The test statistic: z pˆ1  pˆ 2  pˆ  pˆ 1 2  pˆ1  pˆ 2 1 1 pˆ 1  pˆ     n1 n2  Note: x1 117 pˆ1    0.1097 n1 1067 x2 54 pˆ 2    0.1343 n2 402 x1  x2 117  54 pˆ   n1  n2 1067  402 171   0.1164 1469 The test statistic: z   pˆ1  pˆ 2 1 1 pˆ 1  pˆ     n1 n2  0.1097  .1343 1   1 0.11641  0.1164     1067 402   1.315 We reject H0 if: z  z  z0.05  1.645 Not true hence we accept H0. Conclusion: There is not a significant ( = 0.05) increase in the mortality rate due to pipe-smoking Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p1 – p2. Confidence Interval for d 100P% = 100(1 – ) % : = p1 – p2 pˆ1  pˆ 2  z / 2  pˆ1  pˆ 2 pˆ1  pˆ 2  z / 2 pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   n1 n2 Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p2 – p1 pˆ1 1  pˆ1  pˆ 2 1  pˆ 2  pˆ 2  pˆ1  z / 2  n1 n2 0.10971  0.1097 0.13431  0.1343 0.1343  0.1097  1.960  1067 0.0247  0.0382  0.0136 to 0.0629  1.36% to 6.29% 402 Comparing Means Situation • We have two normal populations (1 and 2) • Let m1 and 1 denote the mean and standard deviation of population 1. • Let m2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : m1  m2 vs H A : m1  m2 or 2. H 0 : m1  m2 vs H A : m1  m2 or 3. H 0 : m1  m2 vs H A : m1  m2 Consider the test statistic: z  xy  xy xy    2 x xy  2 1 n   2 2 m  2 y xy 2 x 2 y s s  n m H 0 : m1  m2 is true If: z xy  2 1 n   2 2 m  xy 2 x 2 y s s  n m • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing 1 by sx and 2 by sy) if the sample sizes n and m are large (greater than 30) Note: n n x x i 1 i n sx  y i 1 m i i 1 i n 1 n n y  x  x  2 sy  2   y  y  i i 1 m 1 The Alternative Hypothesis HA The Critical Region H A : m1  m2 z   z / 2 or z  z / 2 H A : m1  m2 z  z H A : m1  m2 z   z Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study. We want to test: H 0 : m1  m2 The exercize group did not have a higher average reduction in blood pressure vs H A : m1  m2 The exercize group did have a higher average reduction in blood pressure The test statistic: z xy   xy xy    2 x xy  2 1 n   2 2 m  2 y xy 2 x 2 y s s  n m Suppose the data has been collected and: n n x x i 1 i n  10.67 sx   x  x  y i 1 m i i 1 n 1 n n  yi 2  7.83 sy  y i 1 i  3.895  y m 1 2  4.224 The test statistic: z xy 2 x 2 y s s  n m  10.67  7.83 3.895 2 500  4.224   2.84   10.4 0.273765 2 400 We reject H0 if: z  z  z0.05  1.645 True hence we reject H0. Conclusion: There is a significant ( = 0.05) effect due to the exercise regime on the reduction in Blood pressure Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let m1 denote the mean of population 1. • Let m2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = m1 – m2. Confidence Interval for d = m1 – m2 mˆ1  mˆ 2  z / 2  mˆ mˆ 1 x  y  z / 2 2 x 2 2 y s s  n m Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = m1 – m2 x  y  z / 2 2 x 2 y s s  n m  3.895 10.67  7.83  1.960 2 500 2.84  1.96(.273765) 2.84  0.537 2.303 to 3.337  4.224   2 400 Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let m1 and 1 denote the mean and standard deviation of population 1. • Let m2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : m1  m2 vs H A : m1  m2 or 2. H 0 : m1  m2 vs H A : m1  m2 or 3. H 0 : m1  m2 vs H A : m1  m2 Consider the test statistic: z  xy  xy xy    2 x xy  2 1 n   2 2 m  2 y xy 2 x 2 y s s  n m If the sample sizes (m and n) are large the statistic t xy 2 x 2 y s s  n m will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small The t test – for comparing means – small samples Situation • We have two normal populations (1 and 2) • Let m1 and  denote the mean and standard deviation of population 1. • Let m2 and  denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. 1 = 2 =  Let n n x x i 1 i n sx  y i 1 m i i 1 i n 1 n n y  x  x  2 sy  2   y  y  i i 1 m 1 The pooled estimate of . Note: both sx and sy are estimators of . These can be combined to form a single estimator of , sPooled. sPooled  n  1sx2  m  1s 2y nm2 The test statistic: xy t s 2 Pooled n  s 2 Pooled m xy  1 1 sPooled  n m If m1 = m2 this statistic has a t distribution with n + m –2 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : m1  m2 t  t / 2 or t  t / 2 H A : m1  m2 t  t H A : m1  m2 t  t t / 2 and t are critical points under the t distribution with degrees of freedom n + m –2. Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period We want to test: H 0 : m1  m2 The treated group did not have a lower average final tumour size. vs H A : m1  m2 The exercize group did have a lower average final tumour size. The test statistic: xy t 1 1 sPooled  n m Suppose the data has been collected and: drug treated untreated 1.89 2.08 1.79 1.28 1.29 1.75 n x  xi n  1.657 i 1 n sx  n y y i 1 m 1.90 i 2.32  x  x  i 1 2.16 2 i n 1  0.3215 n  1.915 sy  2   y  y  i i 1 m 1  0.3693 The test statistic: sPooled  n  1sx2  m  1s 2y nm2 20.3215  50.3693  0.3563 7 2  2 1.657  1.915  .258 t   1.025 .252 1 1 0.3563  3 6 We reject H0 if: t  t   t0.05  1.895 with d.f. = n + m – 2 = 7 Hence we accept H0. Conclusion: The drug treatment does not result in a significant ( = 0.05) smaller final tumour size,

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 18