Download Lecture Notes Number 9 - Michigan State University`s Statistics and

FINAL EXAMINATION STUDY MATERIAL II • Additional Recommended Reading From Course-Pack • CHAPTER 25 • CHAPTER 26 • CHAPTER 27 1 Final Examination Study Material II • CONFIDENCE INTERVALS FOR ONE PROPORTION • CONFIDENCE INTERVALS FOR THE DIFFERENCE BETWEEN TWO PROPORTIONS • CONFIDENCE INTERVALS FOR ONE MEAN • CONFIDENCE INTERVAL FOR THE DIFFERENCE BEWEEN TWO MEANS 2 Point Estimate and Interval Estimate A point estimate is a single number that is our “best guess” for the parameter. Point estimation produces a number (an estimate) which is believed to be close to the value of the unknown parameter. An interval estimate is an interval of numbers within which the parameter value is believed to fall. Interval estimation produces an interval that contains the estimated parameter with a prescribed confidence. 3 Point Estimate and Interval Estimate (Figure 1) 4 Point Estimate and Interval Estimate • Figure 1: A point estimate predicts a parameter by a single number. An interval estimate is an interval of numbers that are believable values for the parameter. • Question: Why is a point estimate alone not sufficiently informative? 5 Point Estimate and Interval Estimate A point estimate doesn’t tell us how close the estimate is likely to be to the parameter. An interval estimate is more useful, it incorporates a margin of error which helps us to gauge the accuracy of the point estimate. 6 Properties of Point Estimators Property 1: A good estimator has a sampling distribution that is centered at the parameter. An estimator with this property is unbiased.  The sample mean is an unbiased estimator of the population mean.  The sample proportion is an unbiased estimator of the population proportion. 7 SOME POINT ESTIMATORS Parameter PROPORTION MEAN STANDARD DEVIATION P   Unbiased Estimator P̂ X S 8 Properties of Point Estimators Property 2: A good estimator has a small standard deviation compared to other estimators.  This means it tends to fall closer than other estimates to the parameter.  The sample mean has a smaller standard error than the sample median when estimating the population mean of a normal distribution. 9 The Logic behind Constructing a Confidence Interval To construct a confidence interval for a population proportion, start with the sampling distribution of a sample proportion. The sampling distribution:  Is approximately a normal distribution for large random samples by the CLT.  Has mean equal to the population proportion.  Has standard deviation called the standard error. 10 Constructing a Confidence Interval to Estimate a Population Proportion • A CONFIDENCE INTERVAL OFTEN HAS THE FORM: POINT ESTIMATE  MARGIN OF ERROR (ME ) • IT IS CONSTRUCTED WITH A PRESCRIBED CONFIDENCE KNOWN AS THE CONFIDENCE LEVEL 11 Confidence Interval or Interval Estimate Sample estimate  Multiplier × Standard Error Sample estimate  Margin of error • Multiplier is a number based on the confidence level desired and determined from the standard normal distribution (for proportions) or Student’s t-distribution (for means). 12 How Do We Find The Multiplier? • Multiplier, denoted as z*, is the standardized score such that the area between -z* and z* under the standard normal curve corresponds to the desired confidence level. • Note: Increase confidence level => larger multiplier 13 The Multiplier 14 For 90% Confidence Level 15 SOME CRITICAL VALUES FOR STANDARD NORMAL DISTRIBUTION C % CONFIDENCE LEVEL 80% CRITICAL VALUE 90% 1.645 95% 1.960 98% 2.326 99% 2.576 Z* 1.282 16 Interpretation of the Confidence Level So what does it mean to say that we have “95% confidence”? The meaning refers to a long-run interpretation—how the method performs when used over and over with many different random samples. If we used the 95% confidence interval method over time to estimate many population proportions, then in the long run about 95% of those intervals would give correct results, containing the population proportion. 17 WHAT DOES C% CONFIDENCE REALLY MEAN? • FORMALLY, WHAT WE MEAN IS THAT C% OF SAMPLES OF THIS SIZE WILL PRODUCE CONFIDENCE INTERVALS THAT CAPTURE THE TRUE PROPORTION. • C% CONFIDENCE MEANS THAT ON AVERAGE, IN C OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER. • E.G. A 95% CONFIDENCE MEANS THAT ON THE AVERAGE, IN 95 OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER. 18 CONFIDENCE INTERVAL FOR PROPORTION P [ONE-PROPORTION Z-INTERVAL] • ASSUMPTIONS AND CONDITIONS RANDOMIZATION CONDITION • 10% CONDITION • SAMPLE SIZE ASSUMPTION OR SUCCESS/FAILURE CONDITION • • INDEPENDENCE ASSUMPTION NOTE: PROPER RANDOMIZATION CAN HELP ENSURE INDEPENDENCE. 19 CONSTRUCTING CONFIDENCE INTERVALS ESTIMATOR SAMPLE PROPORTION P̂ STANDARD ERROR C% MARGIN OF ERROR C% CONFIDENCE INTERVAL SE ( Pˆ )  pˆ qˆ n ME ( pˆ )  z SE ( pˆ ) * pˆ  ME ( pˆ ) 20 Compact Formula For a Confidence Interval For a Population Proportion p pˆ  z  pˆ 1  pˆ  n • p̂ is the sample proportion. • z* denotes the multiplier. where • ˆ 1  p ˆ p n is the standard error of p̂ . 21 Reminder About Standard Deviation and Standard Error •The exact standard deviation of a sample proportion equals: p(1  p) n •This formula depends on the unknown population proportion, p. •In practice, we don’t know p, and we need to estimate the standard error as se  ˆ (1  p ˆ) p n 22 Margin of Error – The margin of error measures how accurate the point estimate is likely to be in estimating a parameter. – It is a multiple of the standard error of the sampling distribution when the sampling distribution is a normal distribution. – The distance of 1.96 standard errors is the margin of error for a 95% confidence interval for a parameter from a normal distribution. 23 Intuitive Explanation of Margin of Error • Margin of Error Characteristics: • The difference between the sample proportion and the population proportion is less than the margin of error about 95% of the time, or for about 19 of every 20 sample estimates. • The difference between the sample proportion and the population proportion is more than the margin of error about 5% of the time, or for about 1 of every 20 sample estimates 24 SAMPLE SIZE NEEDED TO PRODUCE A CONFIDENCE INTERVAL WITH A GIVEN MARGIN OF ERROR, ME ˆ)  z ME ( p SOLVING FOR n GIVES * ˆ qˆ p n ˆ qˆ (z ) p n 2 ( ME ) * 2 ˆ AND qˆ IS A REASONABLE GUESS. IF WE WHERE p CANNOT MAKE A GUESS, WE TAKE p ˆ  qˆ  0.5 25 EXAMPLE DIRECT MAIL ADVERTISERS SEND SOLICITATIONS (a.k.a. “junk mail”) TO THOUSANDS OF POTENTIAL CUSTOMERS IN THE HOPE THAT SOME WILL BUY THE COMPANY’S PRODUCT. THE RESPONSE RATE IS USUALLY QUITE LOW. SUPPOSE A COMPANY WANTS TO TEST THE RESPONSE TO A NEW FLYER, AND SENDS IT TO 1000 PEOPLE RANDOMLY SELECTED FROM THEIR MAILING LIST OF OVER 200,000 PEOPLE. THEY GET ORDERS FROM 123 OF THE RECIPIENTS. (A) CREATE A 90% CONFIDENCE INTERVAL FOR THE PERCENTAGE OF PEOPLE THE COMPANY CONTACTS WHO MAY BUY SOMETHING. (B) EXPLAIN WHAT THIS INTERVAL MEANS. (C) EXPLAIN WHAT “90% CONFIDENCE” MEANS. (D) THE COMPANY MUST DECIDE WHETHER TO NOW DO A MASS MAILING. THE MAILING WON’T BE COST-EFFECTIVE UNLESS IT PRODUCES AT LEAST A 5% RETURN. WHAT DOES YOUR CONFIDENCE INTERVAL SUGGEST? EXPLAIN. 26 SOLUTION 27 EXAMPLE A MAY 2002 GALLUP POLL FOUND THAT ONLY 8% OF A RANDOM SAMPLE OF 1012 ADULTS APPROVED OF ATTEMPTS TO CLONE A HUMAN. (A) (B) (C) (D) (E) FIND THE MARGIN OF ERROR FOR THIS POLL IF WE WANT 95% CONFIDENCE IN OUR ESTIMATE OF THE PERCENT OF AMERICAN ADULTS WHO APPROVE OF CLONING HUMANS. EXPLAIN WHAT THAT MARGIN OF ERROR MEANS. IF WE ONLY NEED TO BE 90% CONFIDENT, WILL THE MARGIN OF ERROR BE LARGER OR SMALLER? EXPLAIN. FIND THAT MARGIN OF ERROR. IN GENERAL, IF ALL OTHER ASPECTS OF THE SITUATION REMAIN THE SAME, WOULD SMALLER SAMPLES PRODUCE SMALLER OR LARGER MARGINS OF ERROR? 28 SOLUTION 29 EXAMPLE IN 1998 A SAN DIEGO REPRODUCTIVE CLINIC REPORTED 49 BIRTHS TO 207 WOMEN UNDER THE AGE OF 40 WHO HAD PREVIOUSLY BEEN UNABLE TO CONCEIVE. (A) FIND A 90% CONFIDENCE INTERVAL FOR THE SUCCESS RATE AT THIS CLINIC. (B) INTERPRET YOUR INTERVAL IN THIS CONTEXT. (C) EXPLAIN WHAT “90 CONFIDENCE” MEANS. (D) WOULD IT BE MISLEADING FOR THE CLINIC TO ADVERTISE A 25% SUCCESS RATE? EXPLAIN. (E) THE CLINIC WANTS TO CUT THE STATED MARGIN OF ERROR IN HALF. HOW MANY PATIENTS’ RESULTS MUST BE USED? (F) DO YOU HAVE ANY CONCERNS ABOUT THIS SAMPLE? EXPLAIN. 30 SOLUTION 31 How Can We Use Confidence Levels Other than 95%? In practice, the confidence level 0.95 is the most common choice. But, some applications require greater (or less) confidence. • To increase the chance of a correct inference, we can use a larger confidence level, such as 0.99. 32 A 99% Confidence Interval Is Wider Than a 95% Confidence Interval. 33 Question: If you want greater confidence, why would you expect a wider interval? • In using confidence intervals, we must compromise between the desired margin of error and the desired confidence of a correct inference. – As the desired confidence level increases, the margin of error gets larger. 34 Effects of Confidence Level and Sample Size on Margin of Error The margin of error for a confidence interval:  Increases as the confidence level increases  Decreases as the sample size increases For instance, a 99% confidence interval is wider than a 95% confidence interval, and a confidence interval with 200 observations is narrower than one with 100 observations at the same confidence level. These properties apply to all confidence intervals, not just the one for the population proportion. 35 What is the Error Probability for the Confidence Interval Method? •The general formula for the confidence interval for a population proportion is: • Sample estimate  Multiplier × Standard Error –which in symbols is pˆ  z(se) 36 What is the Error Probability for the Confidence Interval Method? 37 Confidence Intervals for the Difference Between Two Proportions p1  p2  z * p1 1  p1  p2 1  p2   n1 n2 where z* is the value of the standard normal variable with area between -z* and z* equal to the desired confidence level. 38 Necessary Conditions • Condition 1: Sample proportions are available based on independent, randomly selected samples from the two populations. • Condition 2: All of the quantities – n1 pˆ1, n1 1  pˆ1 , n2 pˆ 2 , and n2 1  pˆ 2  are at least 10. 39 Example: Age and Using the Internet Young:92 of 262 use Internet as main news source p̂1= .351 Old: 59 of 632 use Internet as main news source p̂2= .093 pˆ1  pˆ 2  .351  .093  .258 and s.e.( pˆ1  pˆ 2 )  .0317 • Approximate 95% Confidence Interval: .258  1.96(.0317)  .196 to .320 • We are 95% confident that somewhere between 19.6% and 32.0% more young adults than older adults use the Internet as their main news source. 40 Using Confidence Intervals to Guide Decisions Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A value in a confidence interval is an “acceptable” possibility for the value of a population proportion. Principle 2. When a confidence interval for the difference in two population proportions does not cover 0, it is reasonable to conclude the two population proportions are different. Principle 3. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude the two 41 population proportions are different. Example: Which Drink Tastes Better? • Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B Makers of Drink A want to advertise these results. Makers of Drink B make a 95% confidence interval for the population proportion who prefer Drink A. 95% Confidence Interval: .551  .55 .55  2  .55  .13 60 • Note: Since .50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample. 42 From Proportions To Means ESTIMATING MEANS WITH CONFIDENCE 43 CONFIDENCE INTERVALS FOR ONE POPULATION MEAN The confidence interval again has the form Point estimate  margin of error The sample mean is the point estimate of the population mean. The exact standard error of the sample mean is / n • In practice, we estimate σ by the sample standard • deviation, s, so s.e.x   s n 44 Confidence Intervals for One Population Mean • For large n… from any population, and also, • For small n, from an underlying population that is normal… • The confidence interval for the population mean is: x  z(  n ) 45 Confidence Intervals for One Population Mean In practice, we don’t know the population standard deviation . • Substituting the sample standard deviation s for to s get s.e. x   n introduces extra error. To account for this increased error, we must replace the z-score by a slightly larger score, called a t –score. The confidence interval is then a bit wider. This distribution is called the t distribution.  46 Summary: Properties of the t-Distribution  The t-distribution is bell shaped and symmetric about 0.  The probabilities depend on the degrees of freedom, df  n  1.  The t-distribution has thicker tails than the standard normal distribution, i.e., it is more spread out.  A t -score multiplied by the standard error gives the margin of error for a confidence interval for the mean. 47 t - Distribution 48 t - Distribution • The t Distribution Relative to the Standard Normal Distribution: The t distribution gets closer to the standard normal as the degrees of freedom ( df ) increase. The two are practically identical when df  30 . • Question: Can you find z -scores (such as 1.96) for a normal distribution on the t table? 49 t - Distribution 50 t – Distribution • Part of t - Table Displaying t-Scores. The scores have right-tail probabilities of 0.100, 0.050, 0.025, 0.010, 0.005, and 0.001. When n  7, df  6 • and t.025  2.447 is the t -score with right-tail probability = 0.025 and two-tail probability = 0.05. It is used in a 95% confidence interval, x  2.447( se) 51 t - Distribution 52 t - Distribution • The t Distribution with df = 6. 95% of the distribution falls between -2.447 and 2.447. • These t -scores are used with a 95% confidence interval when n = 7. • Question: Which t -scores with df = 6 contain the middle 99% of a t distribution (for a 99% confidence interval)? 53 Using the t Distribution to Construct a Confidence Interval for a Mean •Summary: 95% Confidence Interval for a Population Mean •When the standard deviation of the population is unknown, a 95% confidence interval for the population mean  is: s x t( ); df  n 1 n .025 •To use this method, you need:  Data obtained by randomization  An approximately normal population distribution 54 SUMMARY 55 ASSUMPTIONS AND CONDITIONS • INDEPENDENCE ASSUMPTION: THE DATA VALUES SHOULD BE INDEPENDENT. THERE’S REALLY NO WAY TO CHECK INDEPENDENCE OF THE DATA BY LOOKING AT THE SAMPLE, BUT WE SHOULD THINK ABOUT WHETHER THE ASSUMPTION IS REASONABLE. • RANDOMIZATION CONDITION: THE DATA SHOULD ARISE FROM A RANDOM SAMPLE OR SUITABLY A RANDOMIZED EXPERIMENT. 56 ASSUMPTIONS AND CONDITIONS • 10% CONDITION: THE SAMPLE IS NO MORE THAN 10% OF THE POPULATION. • NORMAL POPULATION ASSUMPTION OR NEARLY NORMAL CONDITION: THE DATA COME FROM A DISTRIBUTION THAT IS UNIMODAL AND SYMMETRIC. REMARK: CHECK THIS CONDITION BY MAKING A HISTOGRAM OR NORMAL PROBABILITY PLOT. 57 CONSTRUCTING CONFIDENCE INTERVALS FOR MEANS • POINT ESTIMATOR: • STANDARD ERROR: • C% MARGIN OF ERROR: 58 WHERE tn-1* IS A CRITICAL VALUE FOR STUDENT’S t – MODEL WITH n – 1 DEGREES OF FREEDOM THAT CORRESPONDS TO C% CONFIDENCE LEVEL. n (t * n1 2 ) s 2 ME 2 59 REMARK 60 ILLUSTRATIVE PICTURE 61 FINDING CRITICAL t - VALUES • Using t tables (Table T) and/or calculator, find or estimate the • 1. critical value t7* for 90% confidence level if number of degrees of freedom is 7 • 2. one tail probability if t = 2.56 and number of degrees of freedom is 7 • 3. two tail probability if t = 2.56 and number of degrees of freedom is 7 • NOTE: If t has a Student's t-distribution with degrees of freedom, df, then TI-83 function tcdf(a,b,df) , computes the area under the t-curve and between a and b. 62 EXAMPLES FROM PRACTICE SHEET 63 Choosing the Sample Size for Estimating a Population Mean In practice, you don’t know the value of the standard deviation, . • You must substitute an educated guess for  . • Sometimes you can use the sample standard deviation from a similar study. When no prior information is known, a crude estimate that can be used is to divide the estimated range of the data by 6 since for a bell-shaped distribution we expect almost all of the data to fall within 3 standard deviations of the mean. 64 Other Factors That Affect the Choice of the Sample Size  The first is the desired precision, as measured by the margin of error, m.  The second is the confidence level.  The third factor is the variability in the data.  The fourth factor is cost. 65 What if You Have to Use a Small n? The t methods for a mean are valid for any n. However, you need to be extra cautious to look for extreme outliers or great departures from the normal population assumption. – In the case of the confidence interval for a population proportion, the method works poorly for small samples because the CLT no longer holds. 66 Confidence Intervals for Difference in Two Population Means (Independent Samples) 67 Confidence Intervals for Difference for the Difference Between Two Population Means Approximate CI for 1 – 2: x1  x2  t * s12 s22  n1 n2 where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level. Approximate df difficult to specify. Use computer software or conservatively use the smaller of the two sample sizes and subtract 1. 68 Degrees of Freedom The t-distribution is only approximately correct and df formula is complicated (Welch’s approximation): Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n1 – 1 and n2 – 1. 69 Necessary Conditions Two samples must be independent and either: Situation 1: Populations of measurements both bell-shaped, and random samples of any size are measured. Situation 2: Large (n  30) random samples are measured. But if there are extreme outliers, or extreme skewness, it is better to have an even larger sample than n = 30. 70 Example: Effect of a Stare on Driving • Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8 • Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples. 71 Example: Effect of a Stare on Driving 72 Example: Effect of a Stare on Driving Checking Conditions Boxplots show … • No outliers and no strong skewness. • Crossing times in stare group generally faster and less variable. 73 Example: Effect on a Stare on Driving Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula. 74 Equal Variance Assumption and the Pooled Standard Error • May be reasonable to assume the two populations have equal population standard deviations, or 2 2 2      equivalently, equal population variances: 1 2 • Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation: Pooled standard deviation s p  n1  1s12  n2  1s22 n1  n2  2 75 Pooled Standard Error Pooled s.e.( x1  x2 )   s 2 p n1  s 2 p n2 1 1 s     n1 n2   sp 2 p 1 1  n1 n2 76 Pooled Degrees of Freedom (df) • Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2). 77 Pooled Confidence Interval Pooled CI for the Difference Between Two Means (Independent Samples):   1 1 x1  x2  t  s p   n n 1 2   * where t* is found using a t-distribution with df = (n1 + n2 – 2) and sp is the pooled standard deviation. 78 Example: Male and Female Sleep Times • Q: How much difference is there between how long female and male students slept the previous night? • Data: The 83 female and 65 male responses from students in an intro stat class. • Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males. • Note: We will assume equal population variances. 79 Example: Male and Female Sleep Times Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female 83 7.02 1.75 0.19 Male 6.55 1.68 0.21 65 Difference = mu (Female) – mu (Male) Estimate for difference: 0.461 95% CI for difference: (-0.103, 1.025) T-Test of difference = 0 (vs not =): T-Value = 1.62 P = 0.108 DF = 146 Both use Pooled StDev = 1.72 80 Example: Male and Female Sleep Times Notes: • Two sample standard deviations are very similar. • Sample mean for females higher than for males. • 95% confidence interval contains 0 so cannot rule out that the population means may be equal. 81 Example: Male and Female Sleep Times • Pooled Standard Deviation and Pooled Standard Error “by – hand”: Pooled std dev s p   n1  1s12  n2  1s22 n1  n2  2 83  11.752  65  11.682 83  65  2  2.957  1.72 82 Example: Male and Female Sleep Times Pooled s.e.( x1  x2 )  s p 1 1  n1 n2 1 1  1.72   0.285 83 65 83 Pooled or Unpooled? • If the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative. • If the smaller standard deviation accompanies the larger sample size, the pooled test can be quite misleading and not recommended. • If sample sizes are equal, the pooled and unpooled standard errors are equal. Unless the sample standard deviations are quite similar, it is best to use the unpooled procedure. 84 Confidence Interval for the Difference in Two Population Means x1  x2   t *  s.e.x1  x2  1.Make sure appropriate conditions apply checking sample size and/or a shape picture of the differences. 2.Choose a confidence level. 3.Compute the mean and std dev for each sample. 4.Determine whether the std devs are similar enough to pooled procedure can be used. 5.Calculate the appropriate standard error (pooled or unpooled). 6.Calculate the appropriate df. 7.Use Table A.2 (or software) to find the multiplier t*. 85 Examples From Practice Sheet 86

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture Notes Number 9 - Michigan State University`s Statistics and