Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
n >30 or so for means, np and n(1-p) both > 5 for proportions Review: Large Sample Confidence Intervals • 1-a confidence interval for a mean: x +/- za/2 s/sqrt(n) • 1-a confidence interval for a proportion: p +/- za/2 p(1-p)/sqrt(n) • 1-a confidence interval for the difference between two means: x1 – x2 +/- za/2 sqrt(s21/n1+s22/n2) In General: Estimate (that is normally distributed via the Central +/Limit Theorem) ( ) standard deviation Za/2 of estimate This gives an interval: (Lower Bound , Upper Bound) Interpretation: This is a plausible range for the true value of the number that we’re estimating. a is a tuning parameter for level of plausibility: smaller a = more conservative estimate. np and n(1-p) > 5 for all p’s… Large Sample Confidence Intervals • 1-a confidence interval for difference between two proportions: p1-p2 +/- za/2 sqrt[(p1(1-p1)/n1)+(p2(1-p2)/n2)] Designing an Experiment and Choosing a Sample Size • Example: Compare the shrinkage in a tumor due to a “new” cancer treatment relative to standard treatment • 100 patients randomly assigned to “new” treatment or standard treatment xinew = reduction in tumor size for person i under new treatment xjstd = reduction in tumor size for person j under std treatment xnew and s2new xstd and s2std Mean and sample variance of the changes in size for the new and standard treatments Suppose the data are: xnew = 25.3 snew = 2.0 xstd = 24.8 sstd = 2.3 95% Confidence Interval for difference: x1 – x2 +/- za/2 sqrt(s21/n1+s22/n2) = 0.5 +/- 0.84 What can we conclude? • There’s no difference? • Can’t see a difference? • There’s a difference, but it’s too small to care about? There is a difference between: • Can’t see a difference Situation for Cancer example • There’s no difference (In cancer experiment, we can assume we care about small differences.) • Can’t see a difference (that is big enough to care about) = wasted experiment • AVOID / PREVENT THE WASTE AND ASSOCIATED TEARS USE SAMPLE SIZE PLANNING Sample Size Planning • Length of a 1-a level confidence interval is: “2 za/2 std deviation of estimate” 2za/2s/sqrt(n) 2za/2p(1-p)/sqrt(n) 2za/2sqrt((s21/n1)+(s22/n2)) 2za/2sqrt[(p1(1-p1)/n1)+(p2(1-p2)/n2)] 1. Suppose we want a 95% confidence interval no wider than W units. 2. a is fixed. Assume a value for the standard deviation (or variance) of the estimator. 3. Solve for an n (or n1 and n2) so that the width is less than W units. 4. When there are two sample sizes (n1 and n2), we often assume that n1 = n2. Cancer example • Let W = 0.1. Want 95% CI for difference between means with width less than W. • Suppose s2new = s2std = 6 (conservative guess) W > 2za/2sqrt((s2new/n1)+(s2std/n2)) 0.1 > 2(1.96)sqrt(6/n + 6/n) Book’s B = our W/2 0.1 > 3.92sqrt(12/n) 0.01 > (3.922)12/n n > 18439.68 (each group…) Hypothesis testing and p-values (Chapter 9) We used confidence intervals in two ways: 1. To determine an interval of plausible values for the quantity that we estimate. Level of plausibility is determined by 1-a. 90% (a=0.1) is less conservative than 95% (a=0.05) is less conservative than 99% (a=0.01)... 2. To see if a certain value is plausible in light of the data: If that value was not in the interval, it is not plausible (at certain level of confidence). Zero is a common certain value to test, but not the only one. Hypothesis tests address the second use directly Example: Dietary Folate 100 • Data from the Framingham Heart Study 80 n = 333 Elderly Men 60 Mean = x = 336.4 Count Std Dev = s = 193.4 0 20 40 Can we conclude that the mean is greater than 300 at 5% significance? (same as 95% confidence) 0 200 400 600 800 1000 1200 Dietary Folate (micrograms / day, calorie adjusted to 2000 calorie diet) Five Components of the Hypothesis test: 1. Null Hypothesis = “What we want to disprove” = “H0” = “H not” = Mean dietary folate in the population represented by these data is <= 300. = m <= 300 2. Alternative Hypothesis = “What we want to prove” = “HA” = Mean dietary folate in the population represented by these data is > 300. = m > 300 3. Test Statistic To test about a mean with a large sample test, the statistic is z = (x – m)/(s/sqrt(n)) (i.e. How many standard deviations (of X) away from the hypothesized mean is the observed x?) 4. Significance Level of Test, Rejection Region, and P-value Next page 5. Conclusion Reject H0 and conclude HA if test stat is in rejection region. Otherwise, “fail to reject” (not same as concluding H0 – can only cite a “lack of evidence” (think “innocent until proven guilty”) (Equivalently, reject H0 if p-value is less than a.) • Significance Level: a=1% or 5% or 10%... (smaller is more conservative) (Significance = 1-Confidence) • Rejection Region: – Reject if test statistic in rejection region. – Rejection region is set by: • Assume H0 is true “at the boundary”. • Rejection region is set so that the probability of seeing the observed test statistic or something further from the null hypothesis is less than or equal to a • P-value – Assume H0 is true “at the boundary”. – P-value is the probability of seeing the observed test statistic or something further from the null hypothesis. – = “observed level of significance” Note that you reject if the p-value is less than a. (Small p-values mean “more observed significance”) Example: • H0: m<=300, HA: m>300 • z (x-m)/(s/sqrt(n)) = (336.4 – 300)/(193.4/sqrt(333)) = 3.43 • Significance level = 0.05 • When H0 is true, Z~N(0,1). As a result, the cutoff is z0.05=1.645. (Pr(Z>1.645) = 0.05.) • P-value = Pr(Z>3.43 when true mean is 300) = 0.0003 • Reject. Mean is greater than 300. • Would you reject at significance level 0.0001? Picture Distribution of Z = (X – 300)/(193.4/sqrt(333)) when true mean is 300. 0.2 Rejection region 0.1 Observed Test Statistic 0.0 Density 0.3 0.4 Test statistic -4 -2 0 2 4 3.43 1.645 Area to right of 3.43 Area to right of 1.645 =0.0003 = p-value =0.05 = sig level Test Statisistic One Sided versus Two Sided Tests • Previous test was “one sided” since we’d only reject if the test statistic is far enough to “one side” (ie. If z > z0.05) • Two sided tests are more common (my opinion): H0: m=0, HA: m does not equal 0 Two Sided Tests (cntd) Test Statistic (large sample test of mean) z = (x – m)/(s/sqrt(n)) Rejection Region: reject H0 at signficance level a if |z|>za/2 i.e. if z>za/2 or z<-za/2 Note that this “doubles” p-values. See next example. Example: • H0: m=300, HA: m doesn’t equal 300 • z=(x-m)/(s/sqrt(n)) = (336.4 – 300)/(193.4/sqrt(333)) = 3.43 • Significance level = 0.05 • When H0 is true, Z~N(0,1). As a result, the cutoff is z0.025=1.96. (Pr(|Z|>1.96)=2*Pr(Z>1.96)=0.05 • P-value = Pr(|Z|>3.43 when true mean is 300) = Pr(Z>3.43) + Pr(Z<-3.43) = 2(0.0003)=0.0006 • Reject. Mean is not equal to 300. • Would you reject at significance level 0.0005? Picture Distribution of Z = (X – 300)/(193.4/sqrt(333)) when true mean is 300. 0.4 Test statistic Rejection region 0.2 Rejection region 0.0 0.1 Density 0.3 Sig level = area to right of 1.96 + area to the left of -1.96 = 0.05=a -4 -2 -3.43 Area to left of -3.43 =0.0003 1.96 0 Test Statisistic 2 1.96 Pvalue=0.0006=Pr(|Z|>3.43) 4 3.43 Area to right of 3.43 =0.0003 Power and Type 1 and Type 2 Errors Action H0 True Fail to Reject H0 Reject H0 correct Type 1 error Significance level = a =Pr( Making type 1 error ) Truth HA True Type 2 error correct Power = 1–Pr( Making type 2 error ) • Assuming H0 is true, what’s the probability of making a type I error? • H0 is true means true mean is m0. • This means that the test statistic has a N(0,1) distribution. • Type I error means reject which means |test statistic| is greater than za/2. • This has probability a.