Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Sufficient statistic wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Hypothesis Testing To define a statistical Test we 1. Choose a statistic (called the test statistic) 2. Divide the range of possible values for the test statistic into two parts • The Acceptance Region • The Critical Region To perform a statistical Test we 1. Collect the data. 2. Compute the value of the test statistic. 3. Make the Decision: • If the value of the test statistic is in the Acceptance Region we decide to accept H0 . • If the value of the test statistic is in the Critical Region we decide to reject H0 . The z-test for Proportions Testing the probability of success in a binomial experiment Situation • A success-failure experiment has been repeated n times • The probability of success p is unknown. We want to test – H0: p = p0 (some specified value of p) Against – HA: p p0 The Test Statistic z pˆ p0 pˆ pˆ p0 p0 1 p0 n The Acceptance and Critical Region • Accept H0 if: z / 2 z z / 2 • Reject H0 if: z z / 2 or z z / 2 Two-tailed critical region One-tailed critical regions These are used when the alternative hypothesis (HA) is one-sided i.e. H 0 : p p0 and H A : p p0 The Acceptance and Critical Region • Accept H0 if: z z • Reject H0 if: z z or if H 0 : p p0 and H A : p p0 • Accept H0 if: z z • Reject H0 if: z z One-tailed critical regions H 0 : p p0 and H A : p p0 The Acceptance and Critical Region Accept H0 if: z z , Reject H0 if: z z One-tailed critical regions H 0 : p p0 and H A : p p0 The Acceptance and Critical Region Accept H0 if: z z, Reject H0 if: z z Comments • Whether you use a one-tailed or a two-tailed tests is determined by the choice of the alternative hypothesis HA • The alternative hypothesis, HA, is usually the research hypothesis. The hypothesis that the researcher is trying to “prove”. Examples 1. A person wants to determine if a coin should be accepted as being fair. Let p be the probability that a head is tossed. H 0 : p 12 vs H A : p 1 2 One is trying to determine if there is a difference (positive or negative) with the fair value of p. 2. A researcher is interested in determining if a new procedure is an improvement over the old procedure. The probability of success for the old procedure is p0 (known). The probability of success for the new procedure is p (unknown) . H 0 : p p0 vs H A : p p0 One is trying to determine if the new procedure is better (i.e. p > p0) . 2. A researcher is interested in determining if a new procedure is no longer worth considering. The probability of success for the old procedure is p0 (known). The probability of success for the new procedure is p (unknown) . H 0 : p p0 vs H A : p p0 One is trying to determine if the new procedure is definitely worse than the one presently being used (i.e. p < p0) . The z-test for the Mean of a Normal Population We want to test, m, denote the mean of a normal population The Situation • Let x1, x2, x3 , … , xn denote a sample from a normal population with mean m and standard deviation . • Let n x x i 1 n i the sample mean • we want to test if the mean, m, is equal to some given value m0. • Obviously if the sample mean is close to m0 the Null Hypothesis should be accepted otherwise the null Hypothesis should be rejected. The Test Statistic z x m0 x x m0 n x m0 n x m0 n s The Acceptance and Critical Region This depends on H0 and HA Two-tailed critical region H 0 : m m0 and H A : m m0 • Accept H0 if: z / 2 z z / 2 • Reject H0 if: z z / 2 or z z / 2 One-tailed critical regions H 0 : m m0 and H A : m m0 H 0 : m m0 and H A : m m0 • Accept H0 if: z z • Accept H0 if: z z • Reject H0 if: z z • Reject H0 if: z z Example A manufacturer Glucosamine capsules claims that each capsule contains on the average: • 500 mg of glucosamine To test this claim n = 40 capsules were selected and amount of glucosamine (X) measured in each capsule. Summary statistics: x 496.3 and s 8.5 We want to test: H 0 : m Manufacturers claim is correct against H A : m Manufacturers claim is not correct The Test Statistic z x m0 x x m0 n x m0 x m0 n s n 496.3 500 40 8.5 2.75 The Critical Region and Acceptance Region Using = 0.05 z/2 = z0.025 = 1.960 We accept H0 if -1.960 ≤ z ≤ 1.960 reject H0 if z < -1.960 or z > 1.960 The Decision Since z= -2.75 < -1.960 We reject H0 Conclude: the manufacturers’s claim is incorrect: “Students” t-test Recall: The z-test for means The Test Statistic z x m0 x x m0 x m0 s n n Comments • The sampling distribution of this statistic is the standard Normal distribution • The replacement of by s leaves this distribution unchanged only the sample size n is large. For small sample sizes: The sampling distribution of x m0 t s n Is called “students” t distribution with n –1 degrees of freedom Properties of Student’s t distribution • Similar to Standard normal distribution – Symmetric – unimodal – Centred at zero • Larger spread about zero. – The reason for this is the increased variability introduced by replacing by s. • As the sample size increases (degrees of freedom increases) the t distribution approaches the standard normal distribution 0.4 0.3 0.2 0.1 -4 -2 2 4 t distribution standard normal distribution The Situation • Let x1, x2, x3 , … , xn denote a sample from a normal population with mean m and standard deviation . Both m and are unknown. • Let n x x i 1 n n s i the sample mean x x i 1 2 i n 1 the sample standard deviation • we want to test if the mean, m, is equal to some given value m0. The Test Statistic x m0 t s n The sampling distribution of the test statistic is the t distribution with n-1 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : m m0 t t / 2 or t t / 2 H A : m m0 t t H A : m m0 t t t and t/2 are critical values under the t distribution with n – 1 degrees of freedom Critical values for the t-distribution or /2 0 t t / 2 or t Critical values for the t-distribution are provided in tables. A link to these tables are given with today’s lecture Look up Look up df Note: the values tabled for df = ∞ are the same values for the standard normal distribution Example • Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases. • Assume that x1, x2, x3 , x4, x5, x6 is a sample from a normal population with mean m and standard deviation . Both m and are unknown. • we want to test: H 0 : m 0 New diet is not effective versus HA : m 0 New diet is effective The Test Statistic x m0 t s n The Critical region: Reject if t t The Data 1 2.0 2 1.0 3 1.4 4 -1.8 5 0.9 6 2.3 The summary statistics: x 0.96667 and s 1.462418 The Test Statistic x m0 0.96667 0 t 1.619 1.462418 s n 6 The Critical Region (using = 0.05) Reject if t t0.05 2.015 for 5 d.f. Conclusion: Accept H0: Confidence Intervals Confidence Intervals for the mean of a Normal Population, m, using the Standard Normal distribution x z / 2 n Confidence Intervals for the mean of a Normal Population, m, using the t distribution x t / 2 s n The Data 1 2.0 2 1.0 3 1.4 4 -1.8 5 0.9 6 2.3 The summary statistics: x 0.96667 and s 1.462418 Example • Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases. The Data: 1 2.0 2 1.0 3 1.4 4 -1.8 5 0.9 6 2.3 The summary statistics: x 0.96667 and s 1.462418 Confidence Intervals (use = 0.05) x t0.025 s n 1.462418 0.96667 2.571 6 0.96667 1.535 0.57 to 2.50 Comparing Populations Proportions and means Sums, Differences, Combinations of R.V.’s A linear combination of random variables, X, Y, . . . is a combination of the form: L = aX + bY + … where a, b, etc. are numbers – positive or negative. Most common: Sum = X + Y Difference = X – Y Simple Linear combination of X, bX + a Means of Linear Combinations If L = aX + bY + … The mean of L is: Mean(L) = a Mean(X) + b Mean(Y) + … Most common: Mean( X + Y) = Mean(X) + Mean(Y) Mean(X – Y) = Mean(X) – Mean(Y) Mean(bX + a) = bMean(X) + a Variances of Linear Combinations If X, Y, . . . are independent random variables and L = aX + bY + … then Variance(L) = a2 Variance(X) + b2 Variance(Y) + … Most common: Variance( X + Y) = Variance(X) + Variance(Y) Variance(X – Y) = Variance(X) + Variance(Y) Variance(bX + a) = b2Variance(X) Combining Independent Normal Random Variables If X, Y, . . . are independent normal random variables, then L = aX + bY + … is normally distributed. In particular: X + Y is normal with mean m X mY standard deviation X2 Y2 X – Y is normal with mean m X mY standard deviation X2 Y2 Comparing proportions Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions We want to test either: 1. H 0 : p1 p2 vs H A : p1 p2 or 2. H 0 : p1 p2 vs H A : p1 p2 or 3. H 0 : p1 p2 vs H A : p1 p2 The test statistic: z pˆ1 pˆ 2 pˆ pˆ 1 2 pˆ1 pˆ 2 pˆ1 1 pˆ1 pˆ1 1 pˆ1 n1 n1 Where: A sample of n1 is selected from population 1 resulting in x1 successes A sample of n2 is selected from population 2 resulting in x2 successes x1 pˆ1 n1 and x2 pˆ 2 n2 pˆ Logic: 1 p1 1 p1 n1 pˆ pˆ 1 2 pˆ1 2 pˆ 2 p2 1 p2 n1 2 pˆ 2 p1 1 p1 p2 1 p2 n1 n2 1 1 p1 p if p1 p2 p n1 n2 1 1 pˆ 1 pˆ n1 n2 The Alternative Hypothesis HA The Critical Region H A : p1 p2 z z / 2 or z z / 2 H A : p1 p2 z z H A : p1 p2 z z Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers We want to test: H 0 : p1 p2 vs H A : p1 p2 The test statistic: z pˆ1 pˆ 2 pˆ pˆ 1 2 pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 Note: x1 117 pˆ1 0.1097 n1 1067 x2 54 pˆ 2 0.1343 n2 402 x1 x2 117 54 pˆ n1 n2 1067 402 171 0.1164 1469 The test statistic: z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 0.1097 .1343 1 1 0.11641 0.1164 1067 402 1.315 We reject H0 if: z z z0.05 1.645 Not true hence we accept H0. Conclusion: There is not a significant ( = 0.05) increase in the mortality rate due to pipe-smoking Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p1 – p2. Confidence Interval for d 100P% = 100(1 – ) % : = p1 – p2 pˆ1 pˆ 2 z / 2 pˆ1 pˆ 2 pˆ1 pˆ 2 z / 2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n2 Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p2 – p1 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 pˆ 2 pˆ1 z / 2 n1 n2 0.10971 0.1097 0.13431 0.1343 0.1343 0.1097 1.960 1067 0.0247 0.0382 0.0136 to 0.0629 1.36% to 6.29% 402 Comparing Means Situation • We have two normal populations (1 and 2) • Let m1 and 1 denote the mean and standard deviation of population 1. • Let m2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : m1 m2 vs H A : m1 m2 or 2. H 0 : m1 m2 vs H A : m1 m2 or 3. H 0 : m1 m2 vs H A : m1 m2 Consider the test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m H 0 : m1 m2 is true If: z xy 2 1 n 2 2 m xy 2 x 2 y s s n m • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing 1 by sx and 2 by sy) if the sample sizes n and m are large (greater than 30) Note: n n x x i 1 i n sx y i 1 m i i 1 i n 1 n n y x x 2 sy 2 y y i i 1 m 1 The Alternative Hypothesis HA The Critical Region H A : m1 m2 z z / 2 or z z / 2 H A : m1 m2 z z H A : m1 m2 z z Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study. We want to test: H 0 : m1 m2 The exercize group did not have a higher average reduction in blood pressure vs H A : m1 m2 The exercize group did have a higher average reduction in blood pressure The test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m Suppose the data has been collected and: n n x x i 1 i n 10.67 sx x x y i 1 m i i 1 n 1 n n yi 2 7.83 sy y i 1 i 3.895 y m 1 2 4.224 The test statistic: z xy 2 x 2 y s s n m 10.67 7.83 3.895 2 500 4.224 2.84 10.4 0.273765 2 400 We reject H0 if: z z z0.05 1.645 True hence we reject H0. Conclusion: There is a significant ( = 0.05) effect due to the exercise regime on the reduction in Blood pressure Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let m1 denote the mean of population 1. • Let m2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = m1 – m2. Confidence Interval for d = m1 – m2 mˆ1 mˆ 2 z / 2 mˆ mˆ 1 x y z / 2 2 x 2 2 y s s n m Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = m1 – m2 x y z / 2 2 x 2 y s s n m 3.895 10.67 7.83 1.960 2 500 2.84 1.96(.273765) 2.84 0.537 2.303 to 3.337 4.224 2 400 Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let m1 and 1 denote the mean and standard deviation of population 1. • Let m2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : m1 m2 vs H A : m1 m2 or 2. H 0 : m1 m2 vs H A : m1 m2 or 3. H 0 : m1 m2 vs H A : m1 m2 Consider the test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m If the sample sizes (m and n) are large the statistic t xy 2 x 2 y s s n m will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small The t test – for comparing means – small samples Situation • We have two normal populations (1 and 2) • Let m1 and denote the mean and standard deviation of population 1. • Let m2 and denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. 1 = 2 = Let n n x x i 1 i n sx y i 1 m i i 1 i n 1 n n y x x 2 sy 2 y y i i 1 m 1 The pooled estimate of . Note: both sx and sy are estimators of . These can be combined to form a single estimator of , sPooled. sPooled n 1sx2 m 1s 2y nm2 The test statistic: xy t s 2 Pooled n s 2 Pooled m xy 1 1 sPooled n m If m1 = m2 this statistic has a t distribution with n + m –2 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : m1 m2 t t / 2 or t t / 2 H A : m1 m2 t t H A : m1 m2 t t t / 2 and t are critical points under the t distribution with degrees of freedom n + m –2. Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period We want to test: H 0 : m1 m2 The treated group did not have a lower average final tumour size. vs H A : m1 m2 The exercize group did have a lower average final tumour size. The test statistic: xy t 1 1 sPooled n m Suppose the data has been collected and: drug treated untreated 1.89 2.08 1.79 1.28 1.29 1.75 n x xi n 1.657 i 1 n sx n y y i 1 m 1.90 i 2.32 x x i 1 2.16 2 i n 1 0.3215 n 1.915 sy 2 y y i i 1 m 1 0.3693 The test statistic: sPooled n 1sx2 m 1s 2y nm2 20.3215 50.3693 0.3563 7 2 2 1.657 1.915 .258 t 1.025 .252 1 1 0.3563 3 6 We reject H0 if: t t t0.05 1.895 with d.f. = n + m – 2 = 7 Hence we accept H0. Conclusion: The drug treatment does not result in a significant ( = 0.05) smaller final tumour size,