* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download m - Images
Survey
Document related concepts
Transcript
Chapter 11 Inference with Means Sampling Distribution, Confidence Intervals, and Hypothesis Tests The campus of Fritz University has a fish pond. Suppose there are 20 fish in the pond. The lengths of the fish (in inches) are given below: 4.5 5.4 10.3 7.9 6.3 4.3 9.6 8.5 6.6 11.7 8.9 2.2 9.8 8.7 13.3 4.6 10.7 13.4 7.7 5.6 This is a statistic! We caught fish with lengths 6.3 The true mean m = 8. Let’s catch inches,Suppose 2.2 inches, and 13.3 inches. Notice that some we randomly catch a sample ofan This is two more x = 7.27 inches sample means are example of 3 fish from this pond and measure their samples and closer and some 2nd sample 8.5, 4.6, and 5.6 inches. sampling length. What would the mean length of farther away; some look at the x = 6.23 inches variability above and some below the sample be? sample means. rd 3 sample – 10.3, 8.9, and 13.4 inches.the mean. x = 10.87 inches Fish Pond Continued . . . 4.5 5.4 10.3 7.9 6.3 4.3 9.6 8.5 6.6 11.7 8.9 2.2 9.8 8.7 13.3 4.6 10.7 13.4 7.7 5.6 There are 1140 (20C3) different possible samples of size 3 from this population. If we were to catch all those different samples and calculate the mean length of each sample, we would have a distribution of all possible x. This would be the sampling distribution of x. Sampling Distributions of x • The distribution that would be formed by considering the value of a sample statistic for every possible sample of a given size from a population. In this case, the sample statistic is the sample mean x. Fish Pond Revisited . . . Suppose there are only 5 fish in the pond. The lengths of the fish (in inches) are given below: 6.6 11.7 8.9 2.2 What is the mean mand 7.84 x = standard deviation of this sxpopulation? = 3.262 9.8 We will keep the population size small so that we can find ALL the possible samples. Fish Pond Revisited . . . 6.6 11.7 8.9 2.2 9.8 mx = 7.84 and sx = 3.262 Pairs 6.6 & 11.7 6.6 & 8.9 6.6 & 2.2 6.6 & 9.8 x 9.15 7.75 4.4 8.2 11.7 & 8.9 11.7 & 2.2 11.7 & 9.8 8.9 & 2.2 8.9 & 9.8 the9.35 10.3Let’s 6.95 find 10.75all5.55 samples of size 2. 2.2 & 9.8 6 These values determine the How doisthese What the mean mx = 7.84 sampling distribution of x for valuesand compare to standard samples of size 2. the population deviation of these sx = 1.998 meansample and standard means? deviation? Fish Pond Revisited . . . 6.6 11.7 8.9 2.2 9.8 mx = 7.84 and sx = 3.262 11.7, Now6.6,let’s11.7, find 11.7, all the 2.2, 8.9, 8.9, 2.2, samples of size 3. 9.8 2.2 9.8 9.8 Triples 6.6, 11.7, 8.9 6.6, 11.7, 2.2 6.6, 11.7, 9.8 6.6, 8.9, 2.2 6.6, 8.9, 9.8 x 9.067 6.833 9.367 5.9 8.433 6.2 7.6 10.133 7.9 What is the mean mThese values values determine How the do x = 7.84 andthese standard compare to sampling distribution of x for deviation ofthe these sx =samples 1.332of size 3. population and samplemean means? standard deviation? 8.9, 2.2, 9.8 6.967 What do you notice? • The mean of the sampling distribution EQUALS the mean of the population. mx = m • As the sample size increases, the standard deviation of the sampling distribution decreases. as n sx General Properties of Sampling Distributions of x Rule 1: Rule 2: mx m sx s n Note that in the previous fish pond examples this standard deviation formula was not correct because the sample sizes were more than 10% of the population. This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample Sampling Distribution Activity – Day 1 General Properties Continued . . . Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n. The paper “Mean Platelet Volume in Patients with Metabolic Syndrome and Its Relationship with Coronary Artery Disease” (Thrombosis Research, 2007) includes data that suggests that the distribution of platelet volume of use patients whotodogenerate not haverandom metabolic syndrome We can Minitab samples from population.normal We will generate random samples is this approximately with mean m500 = 8.25 and standard of n s= =5 0.75. and compute the sample mean for each. deviation Platelets Continued . . . Similarly, we will generate 500 random samples of n = 10, n = 20, and n = 30. The density histograms below display the resulting 500 x for each of the given sample sizes. What do do you you notice notice What do you notice What about the themeans standard about the shape of about deviation of these these histograms? these histograms? histograms? General Properties Continued . . . Rule 4: Central Limit Theorem When n is sufficiently large, the sampling distribution of x is well approximated by a normal curve, even when the population distribution is not itself normal. How large is “sufficiently large” CLT can safely be anyway? applied if n exceeds 30. The paper “Is the Overtime Period in an NHL Game Long Enough?” (American Statistician, 2008) gave data on the time (in minutes) from the start of the game to the first goal scored for the 281 regular season games from the 2005-2006 season that went into overtime. The density histogram for the data is shown below. Let’s consider these 281 values as a population. The distribution is strongly positively skewed with mean m = 13 minutes and with a median of 10Using minutes. Minitab, we will generate 500 samples of the following sample sizes from this distribution: n = 5, n = 10, n = 20, n = 30. What do you notice These Are these are histograms What dothe youdensity notice about the standard histograms centered for the at 500 about the shape of deviations of these samples approximately m = 13? these histograms? histograms? A hot dog manufacturer asserts that one of its brands of hot dogs has a average fat content of 18 grams per hot dog with standard deviation of 1 gram. Consumers of this brand would probably not be disturbed if the mean was less than 18 grams, but would be unhappy if it exceeded 18 grams. An independent testing organization is asked to analyze a random sample of 36 hot dogs. Suppose the resulting sample mean is 18.4 grams. Does this Since the sample size is result suggest that the manufacturer’s claim is greater than 30, the incorrect? Central Limit Theorem applies. So the distribution of x is approximately normal with mx 18 and sx 1 .1667 36 Hot Dogs Continued . . . mx 18 and sx 1 .1667 36 Suppose the resulting sample mean is 18.4 grams. What do we Does this result suggest that the manufacturer’s call this? claim is incorrect? P(x > 18.4) = Yes, the probability of getting a sample mean of 18.4 is small enough to cause us to doubt that the manufacturer’s claim is correct. Now let’s look at confidence intervals to estimate the mean m of a population. Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure cosmic A study The use of theto value of sradiation. introduces reportedextra a mean annual cosmic radiation we dose of 219 variability. Therefore, will mrems with a standard deviation of 35 mrems for a use the distribution of t values since it sample of flight personnel of Xinjiang Airlines. a standard Supposehas thismore mean variability is based on than a random sample of normal curve. 100 flight crew members. Remember that: sx s Are these values parameters or statistics? n What value would you use for s? What are t-distributions? • Created by William Gosset Let’s look at some graphs of t-distributions Important Properties of t Distributions t distributions are described The t distribution corresponding to any by particular degreesofoffreedom freedom (df).shaped and number of degrees is bell centered at zero (just like the standard normal (z) distribution). 2) Each t distribution is more spread out than the standard normal distribution. 1) z curve t curve for 2 df 0 Why is the z curve taller than the t curve for 2 df? Important Properties of t Distributions Continued . . . 3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases. For what df would the 4) As the number of degrees oftfreedom increases, distribution be the corresponding sequence approximately of t distributions the approaches the standard normal samedistribution. as a standard normal z curve distribution? t curve for 2 df t curve for 5 df 0 Confidence intervals for m when s is unknown Confidence intervals for m when s is unknown The general formula for a confidence interval for a population mean m based on a sample of size n is. . . But this is unknown – so what What is this called? What What is thisiscalled? this called? do we use? Standard error s x (t critical value) n estimate Margin of error Where the t critical value is based on df = n - 1. How to find t* • Use Table B for t distributions • Look up confidence level at bottom & df on the sides • df = n – 1 Can also use invT on the calculator! Find these t* 90% confidence when n = 5 95% confidence when n = 15 t* = 2.132 t* = 2.145 Steps for a Confidence Interval We typically use either a boxplot or a normal • Verify assumptions probability plot to assess normality. – Random sample – Distribution is (approximately) normal AND why • Given • Large sample size • Graph data You MUST show the graph • Calculate interval • Write conclusion in context We are ____% confident that the true mean ______ is between ___ and ___. Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure to cosmic radiation. A study reported a mean annual cosmic radiation dose of 219 mrems and a standard deviation of 35 mrems for a sample of flight personnel of Xinjiang Airlines. Suppose this mean is based on a random sample of 100 flight crew members. Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members. 1)Data is from a random sample of crew members First, verify that the 2)Sample size n is large (n > 30) conditions are met. Cosmic Radiation Continued . . . Let What would happen to the width of x = 219 mrems this interval if the confidence level wascrew 90%members instead of 95%? n = 100 flight s = 35 mrems. Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members. 35 219 1.984 (212.06, 225.94) 100 What does this mean in context? We are 95% confident that the actual mean annual cosmic radiation exposure for Xinjiang flight crew members is between 212.06 mrems and 225.94 mrems. The article “Chimps Aren’t Charitable” (Newsday, November 2, 2005) summarized the results of a research study published in the journal Nature. In this study, chimpanzees learned to use an apparatus that dispersed food when either of two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the Firstthe verify apparatus received food. When otherthat rope was conditions forcontrolling a pulled, food was dispensedthe both to the chimp t-interval are met.cage. The the apparatus and also a chimp in the adjoining accompanying data represent the number of times out of 36 trials that each of seven chimps chose the option that would provide food to both chimps (charitable response). 23 22 21 24 19 20 20 Compute a 99% confidence interval for the mean number of charitable responses for the population of all chimps. Chimps Continued . . . 23 Normal Scores 2 1 -1 -2 22 21 24 19 20 20 The plot is straight, Let’s suppose itreasonable is reasonable to Since n is small, we this needsample toitverify ifplausible it is so seems regard of seven plausible that this sample is from a the that the population chimps as representative of 20 22 24 Number of Charitable Responses population that ischimp approximately normal. distribution of population. number of charitable responses is Let’s use a normal probability plot. approximately normal. Chimps Continued . . . 23 22 21 24 19 20 x = 21.29 and s = 1.80 20 df = 7 – 1 = 6 s x (t critical value) n 1.80 21.29 3.71 (18.77, 23.81) 7 We are 99% confident that the mean number of charitable responses for the population of all chimps is between 18.77 and 23.81. Choosing Sample Size Thisarequires s to be known – since it is rarely known, MUST assume The margin ofwe error associated with a it is some specified confidence interval is value. s m z * n Solve this for n: We can use this to find the necessary sample size for a particular margin of error. The financial aid office wishes to estimate the mean cost of textbooks per quarter for students at a particular university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy? Assume s = $100 Always round sample size up to the next whole number! 100 20 1.96 n n 96.04 n 97 Hypothesis Tests for a Population Mean Let’s review the assumptions for a confidence interval for a population mean 1) x is the sample mean from a random sample, 2) The distribution is (approximately) normal because sample size n is large (n > 30) or the graph indicates it’s plausible, and 3) s, the population standard deviation, is unknownThe assumptions are the same for a large-sample hypothesis test for a population mean. The One-Sample t-test for a Population Mean Null hypothesis: Alternative Hypothesis: Test Statistic: H0: m = hypothesized value Ha: m > hypothesized value Ha: m < hypothesized value Ha: m ≠ hypothesized value x m t s n How to find a p-value To find a p-value: • Use tcdf(LB, UB, df) What would you do if this were a two-sided test? Find the p-value for a one-sided test with a test statistic t = -1.5 when n = 15. tcdf(-∞,-1.5,14) = .0779 A study conducted by researchers at Pennsylvania State University investigated whether time perception, an indication of a person’s ability to concentrate, is impaired during nicotine withdrawal. After a 24-hour smoking abstinence, 20 smokers were asked to estimate how much time had elapsed during a 45-second period. Researchers wanted to see whether smoking abstinence had a negative impact on time perception, causing elapsed time to be overestimated. Suppose the resulting data on perceived elapsed time (in seconds) were as follows: 69 55 72 73 59 55 39 52 57 57 56 50 70 47 56 45 70 54 57 53 What is the mean and standard x = 57.30 s = 9.30 n = 20 deviation of the sample? Smoking Abstinence Continued . . . 69 55 72 73 59 55 39 52 57 57 56 50 70 47 56 45 70 54 57 53 x = 57.30 s = 9.3 n = 20 Where m is the true mean perceived elapsed H0: m = 45 time for smokers who haveState abstained from the Ha: m > 45 smoking for 24-hours Since the boxplot is approximately hypotheses. Assumptions: symmetrical, it is plausible that the 1) It is reasonable todistribution believe thatisthe sampleVerify of smokers population To do this, we need to graph the assumptions. is representative of all smokers. approximately normal. data using a boxplot or normal 2) Since the sample size is probability not plot at least 30, we must determine if it is plausible that the population distribution is approximately 40 50 60 70 normal. Smoking Abstinence Continued . . . 69 55 72 73 59 55 39 52 57 57 56 50 70 47 56 45 70 54 57 53 x = 57.30 s = 9.3 n = 20 H0: m = 45 Ha: m > 45 Where m is the true mean perceived elapsed time for smokers whoCompute have abstained from the test smoking for 24-hours Test statistic: P-value ≈ 0 statistic and P-value. 57.30 45 t 5.92 9.30 20 a = .05 Since P-value < a, we reject H0. There is convincing evidence that the mean perceived elapsed time is greater than the actual elapsed time of 45 seconds. Smoking Abstinence Continued . . . 69 55 72 73 59 55 39 52 57 57 56 50 70 47 56 45 70 54 57 53 x = 57.30 s = 9.3 n = 20 Compute the appropriate Where m is the true mean perceived elapsed H0: m = 45 confidence time for smokers who have interval. abstained from Ha: m > 45 smoking for 24-hours Since P-value < a, we reject H0. Notice that the a = .05 hypothesized value 57.30 1.729 9.30 of 45 is NOT in the 20 90% confidence Since this is a one-tailed test, a ) ( 53 . 705 , 60 . 895 interval wetail. .05 goes in goes inand thethat upper “rejected” H0! leaving .90 in the the lower tail, middle. A growing concern of employers is time spent in activities like surfing the Internet and emailing friends during work hours. The San Luis Obispo Tribune summarized the findings of a large survey of workers in an article that ran under the headline “Who Goofs Off More than 2 Hours a Day? Most Workers, Survey Says” (August 3, 2006). Suppose that the CEO of a large company wants to determine whether the average amount of wasted time during an 8-hour day for employees of her company is less than the reported 120 minutes. Each person in a random sample of 10 employees was contrasted and asked about daily wasted time at work. The resulting data are the following: 108 112 117 130 111 131 113 113 105 128 What is the mean and standard x = 116.80 s = 9.45 n = 10 deviation of the sample? Surfing Internet Continued . . . 108 112 117 130 111 131 113 113 105 128 x = 116.80 s = 9.45 n = 10 H0: m = 120 Ha: m < 120 Where m is the true mean daily wasted time for employees of this company The boxplot reveals some skewness, State the Verify the but there is no outliers. It is plausible Assumptions: hypotheses. assumptions. that the population distribution is 1) The given sample was a random sample of employees approximately normal. 2) Since the sample size is not at least 30, we must determine if it is plausible that the population 110 120 130 distribution is approximately normal. Surfing Internet Continued . . . 108 112 117 130 111 131 113 113 105 128 x = 116.80 s = 9.45 n = 10 H0: m = 120 Ha: m < 120 Where m is the true mean daily wasted time for employees of this company the testwe 116potential .80 Compute 120 error What could and 1.07 Test Statistic: thave statistic TypeP-value. II 9made? .45 P-value =.150 10 a = .05 Since p-value > a, we fail to reject H0. There is not sufficient evidence to conclude that the mean daily wasted time for employees of this company is less than 120 minutes.