Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Brief Lecture notes STP231 Instructor: Ela Jackiewicz Chapter7 Testing Hypothesis: summary The Nature of Hypothesis Testing Hypothesis is a statement about population parameters, like mean, proportion, two means or two proportions. For example: 1. We may wander if mean age ( )of people in juvenile detention in Arizona in 2008 was less than 16, the mean age in 2005. 2. Is the proportion of people that approve of a job our governor is doing ( p ) greater than 60% 3. Is mean age of cars driven in Arizona, 1 different than mean age of cars driven in California, 2 ? A hypothesis test allows us to make a decision of judgment about the value (s) of a parameter (or parameters) A hypothesis test involves two hypothesis – Null hypothesis and Alternative (Research) hypothesis. • Null hypothesis (Ho): A hypothesis to be tested, for example for each of example for each of the questions above null hypothesis will be respectively: ex1 H0 : =16 ex2 H0 : p=.6 ex3 H0 : 1 = 2 , null hypothesis will specify a value of the parameter of one population or will state that two parameters from two populations are equal. Notice = sign in each statement. • Alternative hypothesis (Ha): A hypothesis to be considered as an alternate to the null hypothesis. - The choice of the alternative hypothesis depends on and should reflect the purposes of the hypothesis test and depends on question asked in the problem. ex1 Ha : < 16 - left-tailed test ex2 Ha : p > 0.6 - right-tailed test ex3 Ha : 1 ≠ 2 - two-tailed test Hypothesis in ex1 and ex2 are called directional and in ex3 nondirectional. The logic of hypothesis testing 1. Take a random sample from one population (or two samples from 2 populations) 2. If the data do not provide enough evidence in favor of the alternative hypothesis, do not reject the null hypothesis. 3. If the data provide enough evidence in favor of the alternative, reject the null hypothesis. Brief Lecture notes STP231 Instructor: Ela Jackiewicz Terms, Errors, and Hypothesis • Test statistic: The statistic is a value computed from a sample (s) used as a basis for deciding whether the null hypothesis should be rejected Following terminology is used for the rejection region method of testing hypotheses, this method is not presented it in our book. • Rejection region: The set of values for the test statistic that leads to rejection of the null hypothesis (the rejection region method to test hypothesis is not used in our book) • Non-rejection region: The set of values for the test statistic that leads to nonrejection of the null hypothesis • Critical values: The values of the test statistic that separate the rejection and nonrejection regions. critical values are part of the rejection region Type I Error & Type II Errors Do not reject H0 Reject H0 H0 is True H0 is False Correct Decision Type I error Type II error Correct Decision • Type I error: Rejecting the null hypothesis when it is in fact true. • Type II error: Not rejecting the null hypothesis when it is in fact false. Probabilities of Type I and Type II Errors • Significance level : The probability of making a Type I error (rejecting a true null hypothesis), always selected in advance of seeing data. • Power of a hypothesis test: Power = 1 – P(Type II error) = 1- = 1 - The Probability of rejecting a false null hypothesis Power near 0: the hypothesis test is not good at detecting a false null hypothesis. Power near 1: the hypothesis is extremely good at detecting a false null hypothesis Relation between Type I and Type II error probabilities For a fixed sample size, the smaller we specify the significance level, , the larger will be the probability, , of not rejecting a false null hypothesis Possible conclusions for a hypothesis test Suppose a hypothesis test is conducted at a small significance level 1. If the null hypothesis is rejected, we conclude that the there is evidence for alternative hypothesis and results are statistically significant at the level Brief Lecture notes STP231 Instructor: Ela Jackiewicz 2. If the null hypothesis is not rejected, we conclude that the data do not provide sufficient evidence to support the alternative hypothesis and results are not statistically significant at the level . P-Value: To obtain a P-value (P) of a hypothesis test, we compute, assuming the null hypothesis is true, the probability of observing a value of the test statistic as extreme or more extreme than that observed. By extreme we mean far from what we would expect to observe if the null hypothesis were true. If test is left tailed, extreme means equal or much smaller than test statistics observed, if test is right tailed, extreme means equal or much larger than test statistics observed, if test is two tailed, extreme means either equal or larger than absolute value of test statistics observed or equal or smaller than negative of absolute value of it. P-value: referred to as observed significance level or probability value. Two ways to decide if H0 should be rejected or not: 1. If test statistics falls into rejection region, H0 should be rejected (again this method is not presented in our book) 2. If p-value ≤ , H0 should be rejected Guidelines for using the P-value to assess the evidence against H0 P-value Evidence against H0 P > 0.10 Weak or none Moderate 0.05 < P ≤ 0.10 Strong 0.01 < P ≤ 0.05 Very Strong P ≤ 0.01 Steps in Hypotheses testing: P-VALUE APPROACH 1. State H0 and Ha 2. Decide on α 3. Compute test statistic 4. determine the P-value 5. If P ≤ α reject H0; otherwise, do not reject H0. 6. Interpret the result of the hypothesis test. T-tests and Z-tests: If we test hypothesis for one population mean or two populations means we have a z- test if population standard deviation(s) are known and a t-test if it is unknown and replaced by sample standard deviations deviation s. We concentrate only on the t tests, since population standard deviations are usually unknown. Brief Lecture notes STP231 Instructor: Ela Jackiewicz We have a t-test because test statistics has exactly or approximately t distribution if H0 is true . Type of the tests (z or t) and type of alternative hypothesis determines the way we compute p-values. Computing p-value for a t-test, observed test statistics is t s , use t-curve with appropriate degrees of freedom If Ha is a) Two tailed ( ≠ ) compute area for two tails: t≤−∣t s∣ and t≥∣t s∣ b) right tailed ( > ) compute the area of a right tail: t≥t s c) left tailed ( < ) compute the area of a left tail : t ≤t s Assumption for all tests are: samples are simple random samples of independent observations, populations are normal or samples are large ( n≥30 ). If two parameters are compared, samples must also be independent. Tests for Two Population Means H 0 : 1= 2 vs H a : 1≠2 or H a : 12 or H a : 12 (independent samples, populations normal or large samples) Non pooled t-test under assumption that standard error of y1 − y2 : SE y − y = 1 2 1 ≠ 2 2 1 s s 22 2 2 = SE1SE2 , n 1 n2 ȳ1− ȳ2 test statistics: t s = has approximately t distribution (if null hypothesis is true) SE ȳ −ȳ Degrees of freedom are estimated by software: 1 df = 2 [s 21 /n1 s22 /n 2 ]2 s 21 /n1 2 s 22 /n2 2 , round it down to nearest integer. n1 −1 n 2−1 If not given, liberal estimate df =n1 n2 −2 or conservative estimate df =minn1−1,n 2−1 is used. Following example illustrates nondirectional hypothesis test: EX1 Is autism marked by different brain growth patterns in early life. Studies have linked brain size in infants and toddlers to a number of future ailments, including autism. One study looked at brain size of 30 autistic boys and 12 nonautistic boys (control) who had received MRI scan as toddlers. The average brain volumes and standard deviations in milliliters are given below. Two samples may be regarded as SRS from two populations. Is there a difference between the means? Test appropriate hypothesis, use α = 0.05. Brief Lecture notes STP231 Instructor: Ela Jackiewicz SE y Group Condition n s True means y _____________________________________________________________________ A 1 Autistic 30 1297.6 88.4 16.14 2 Control 12 1179.3 70.7 20.4 C (not pooled t-test) We test H 0 : 1= 2 (no difference in mean brain volumes for 2 pop. ) vs H a : 1 ≠ 2 (there is a difference in mean brain volumes for 2 pop. ) SE=26, t=4.55, df=25.3, use 25 P-value method: P value is the area under t curve with 25 df, left of -4.55 and right of 4.55 1 pv <.0005 , so pv<.001 from the tables , 4.55>3.7251, so , 2 using calculator: 2* tcdf(4.55, 10^6, 25) =1.2*10-4 < .05 reject H0 , there is evidence for alternative hypothesis . There is a difference between mean brain volumes for two populations. The Relation Between Hypothesis Tests and Confidence Intervals If H 0 : 1= 2 is not rejected against two tailed (nondirectional) alternative, at given α level than (1- α)*100% CI for the difference between two means will contain 0, otherwise it will not contain 0. If CI contains 0, there is no evidence that means are different. In EX1: 95 CI for 1− 2 is: (64.7, 171.9), clearly no 0 inside. Previous example considered only a directional hypothesis , where H a :1 ≠ 2 Next we will examine tests with directional alternatives, H a : 12 or H a : 12 We use a directional alternatives if we have reason to believe that the difference between the means goes in a particular direction. We add a following step to the test: Check if the directionality is correct : if we have H a : 12 , then we must have Ȳ1>Ȳ2 if we have H a : 12 , then we must have Ȳ1<Ȳ2 If directionality is incorrect , then p-value>0.50 and null hypothesis is not rejected. If directionality is correct, we proceed with the test as in the case of nondirectional alternative, just compute p-value as a left tail area of right tail area, depending on alternative hypothesis. Following example illustrates directional hypothesis test: Brief Lecture notes STP231 Instructor: Ela Jackiewicz EX2. Researchers want to know if niacin supplement in a diet for young lambs is effective in increasing weight (over a standard diet). Let μ1 = mean weight gain of all lambs fed with standard diet plus niacin supplement and μ 2 = mean weight gain of all lambs fed with standard diet only . We test H 0 : 1= 2 ( niacin not effective) vs Let use α=0.05 H a : 12 (niacin effective) Let us consider following two situations 1) Suppose that ȳ1=10 lb and ȳ1=13 lb Since ȳ1< ȳ2 , directionality is incorrect, so p-value>0.50 and we do not reject null hypothesis. There is no evidence for alternative hypothesis and conclusion is that niacin is not effective. 2) Suppose that ȳ1=15 lb and ȳ1=10 lb , in that case ȳ1> ȳ2 , so we have correct directionality. In that case we proceed with the test. Suppose our data gives 15−10 =2.27 , palso : SE ȳ − ȳ =2.2 lb and 9 degrees of freedom, t s= 2.2 value=tcdf(2.27, 106, 9)=0.025< .05, so null hypothesis is rejected, we have evidence for alternative hypothesis, niacin is effective. 1 2 EX3 Experiment was conducted to see if wounding a tomato plant would make it improve its defense against insects. Researchers grew larvae of the tobacco hornworm on wounded and unwounded plants, weight in mg after 7 days of growth was recorded. Summary of the results are given below: Wounded (1) Control (2) n 16 18 28.66 37.96 ̄y s 9.02 11.14 df=31.8 (use 31) We test H 0 : 1= 2 ( wounding not effective) vs H a : μ 1<μ2 (wounding effective), Let use α=0.01 28.66−37.96 =−2.69 , p-value=tcdf(-106, -2.69, 3.46 31)=.006<.01, so we reject null hypothesis, there is evidence for alternative hypothesis at 1% significance level. Yes, wounding appears to increase plant's defense against insects. SE ȳ − ȳ =3.46 mg , t s= 1 2 Few remarks about statistical versus practical significance. By declaring difference between the means “statistically significant” we mean that difference is large enough not to be simply considered due only to a sampling error. If difference is “not statistically significant”, we consider it small enough to be likely caused by a sampling error. If we find our results statistically significant (null hypothesis was rejected), often it is only because we had very large samples, but our difference has no practical significance. Brief Lecture notes STP231 Instructor: Ela Jackiewicz For example: Mean test score for two large schools are 73 and 74 , practically identical, but if our statistics are based on 2 samples of size 1000 and sample SD-s are 5 and 6, then 73−74 =4.05 and for two-tailed test our our test statistics is: t s= 2 √ 5 /1000+62 /1000 p-value=0.0001 (999=df) and we reject null hypothesis, results are statistically significant, but have no practical significance. Using CI-s to assess importance. Consider following example; A study records a blood pressure change (mmHg) for two types of medications. Suppose we will consider difference between two population means important if it will exceed 9 mmHg. For 95% CI for μ1 −μ 2 indicate if difference is statistically significant and/or important. a) (4.5, 6.7) Significant (zero not inside) b) (9.7, 10.2) Significant (zero not inside) c)(-4.6, 6.8) Not significant (zero inside) d) (6.8, 9.5) Significant (zero not inside) e) (-6.8, 9.5) not important (all numbers< 9) important (all numbers>9) not important (all numbers< 9) can't tell if important (some numbers< 9, some > 9)) Not significant (zero not inside) can't tell if important (some numbers < 9, some > 9)) Optional: Pooled t-test under assumption that 1 = 2 Pooled sample standard deviation estimates common standard deviation: n1 −1 s21 n2 −1 s22 sp = n1 n2 −2 test statistics: t s= ȳ1− ȳ2 has t distribution with df =n1 n2 −2 s p √ 1/n 1+1/n 2 (if null hypothesis is true) EX4. The number of friends consulted for advice before purchasing a car or a computer was examined by a certain consumer research paper. Two independent samples of consumers were selected. The summary statistics consistent with the information in the paper are given in the following table: Type of purchase Number of purchases (n) Mean number of friends ) consulted ( X (1) car 12 3.65 (2) computer 15 4.26 Standard deviation (S) 0.42 0.46 Is mean number of friends consulted before each purchase greater for people purchasing computers? Test appropriate hypothesis at 5% significance level. Assume that populations are normal with have equal standard deviations. H a :1 2 t= 365−426 =−3.56 df=25, pooled t-test .443 1/121/15 p-value: area left of -3.56 unted t-curve with 25 df is computed as follows: Brief Lecture notes STP231 Instructor: Ela Jackiewicz tcdf(-10^6, -3.56, 25)=7.7*10-4 < .05, Reject H0 , evidence that more friends are consulted before computer purchase p-value from tables: 3.56 > 2.287= t.005, so p-value< .005 Calculator (TI 83, 84) All tests and CI-s are available in STATS TESTS option: CI for 2 population means: 2-SampTInt (select POOLED YES or NO) Tests for 2 populations means: 2-SampTTest (select POOLED YES or NO) (we only covered non-pooled t-tests and CI-s, so select NO)