Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 120 Section 6.1 Estimating With Confidence Exercise 1: In this exercise, we will find some special z values, denoted z*, with the property that the middle C % of all the z values fall between -z* and +z*. a) Find z* so that 90% of the z's in a standard normal distribution fall between -z* and +z*. (Hint: if 90% is in the middle, where is the remaining 10%?). Shade in the middle 90% on the graph. a) Find z* so that 95% of the z's in a standard normal distribution fall between z* and z*. Shade in the middle 95% on the graph. a) Find z* so that 98% of the z's in a standard normal distribution fall between -z* and z*. Shade in the middle 98% on the graph below. The percentage C is called the confidence level, because we are C % confident that any given z will fall into the corresponding interval. Use the above results to complete the following table: Z* 90% 95% C o n f i d en c e L e v e l 98% C A more complete version of this table is given in the last 2 rows on page 583 of the text book. Math 120 Lecture Notes 42 Exercise 2: Suppose the eggs produced by the chickens on Sleepy Hollow farm have a mean weight (x) of x = 2 ounces and a standard deviation of x = .125 ounces (assume that the x's are normally distributed). Each carton of one dozen eggs produced at Sleepy Hollow Farm is a random sample of the eggs. Use the z* values found in Exercise 1 to help complete this exercise: If we select a carton of eggs at random, there is a 90% probability that the sample mean of the eggs in the carton, x , will be within _________ ounces of the population mean x. That is, you are 90% confidant that x will not be further than ____________ ounces away from 2 ounces. Distribution of x Math 120 Lecture Notes Distribution of z 43 Exercise 3: Consider the following statement: "There is a 90% chance that x will not be further than .25 ounces from x ." Fill in the blank below to make another sentence that says the same thing: "There is a 90% chance that x will not be further than .25 ounces from ______ " In the following exercise, we will use a sample mean, x , to estimate an unknown population mean, x . Exercise 4: At Morning Star farm, no one knows what the mean weight, x, of the eggs produced there is. (However, suppose it is known that the standard deviation is x = 1.5 ounces). You buy a dozen eggs from Morning Star farm and find that the mean weight of the eggs in the carton, x , is 1.9 ounces. Use the z* values found in Exercise 1 to help complete the following sentence: "There is a 90% chance that x will not be further than __________ ounces from the sample mean, x = 1.9. That is, there is a 90% chance that the population mean x will be between ___________ ounces and ___________ ounces" Math 120 Lecture Notes 44 In most of the examples we have done so far, the true population mean x is given. But usually the true population mean x is not known. We use the sample mean, x , to estimate the unknown parameter x. How good an estimate of x is a particular x ? We can answer this question because we know about the distribution of the x 's , that is, we know whether most of the x 's are clustered close to the true mean x or whether they tend to be more spread out. Remember Exercise 1 in the previous Section 6.1 handout? For various values of C (90%, 95%, 98%), and using the standard normal distribution with mean z = 0, we found z* values with the property that C% of all the z's fall between 0 - z* and 0 + z*, Using what we know about the distribution of the x 's, we can, convert the z* values into numbers that are meaningful on the x distribution, as we did in Exercises 3 and 4 on page 44 of the notes . In the following exercise we also do this, but this time we will leave the answer in terms of z*, x and n: Exercise 5: Suppose we know that x 's (calculated from samples of size n) are normally distributed with mean x and standard deviation x n . Fill in the blanks in the statements below: (Assume that z* is the correct z* to go along with the C%) If we select an x at random, there is a C% chance that it will lie between x - _________ and x + _________ . This is the same as saying that there is a C% chance that x will lie between x - _________ and x +_________. This interval is called a C% confidence interval for the mean x. Math 120 Lecture Notes x 45 The purpose of a confidence interval is to estimate an unknown parameter with an indication of how accurate the estimate is and how confident we are that the result is correct. It consists of two parts: 1. The interval: estimate + margin or error 2. The confidence level, which states the probability that an interval containing the true parameter will be produced by the methods used. Typical confidence levels are 90%, 95% and 99%. In general, if the confidence level is C, and x is the mean of a random sample of size n taken from a normal population1 with standard deviation , then the confidence interval is given by: x z* σ n where C% of the z's lie between -z* and +z*. As we have seen, the z* associated with most commonly used C levels are conveniently listed on the last row of the table on page 583. Exercise 5: The mean weight (x) of eggs produced at Sleepy Chicken Farm is unknown. (Assume the weights are normally distributed with a standard deviation of = .23 ounces. A carton of one dozen eggs produced by the farm contains eggs with a mean weight of x = 2.6 ounces. a) Find a 90 % confidence interval for x b) Find a 98% confidence interval for x c) Repeat (a) and (b) given that the carton contained 36 eggs instead of twelve. 1 Also if the population that the sample is drawn from is not normal, provided n > 30. Math 120 Lecture Notes 46 is a C% confidence estimate for a population mean x Recall: x z * n is called the margin of error. The quantity m = z * n We may produce a confidence interval having a desired confidence level and margin of error by adjusting the sample size: The sample size required to produce a confidence interval with margin of error m is: z * n m 2 Exercise 1: The mean weight (x ) of the eggs produced by chickens at Windy Poplar Farm is not known. (Assume the weights are normally distributed with = .25). a) You buy a random sample of one dozen eggs and find the mean weight in your sample is x is 2.3 ounces. i) Find a 90% confidence interval to estimate x. ii) Find a 98% confidence interval to estimate x. b) How large a sample would be required to produce a margin of error of .08 ounces at the 98% confidence level? Math 120 Lecture Notes 47 c) Plot all three confidence intervals on the number lines below. Sample size = _________ Confidence level = ________ Sample size = _________ Confidence level = ________ Sample size = _________ Confidence level = ________ d) Based on the above result, complete the sentences below: i) When the confidence level increases, the margin of error (increases/decreases) ii) When the sample size increases, the margin of error (increases/decreases) Exercise 2: A carton of Canada Grade A Extra Large eggs is supposed to weight at least 27 ounces. a) If a carton of 12 eggs weights at least 27 ounces, then the mean weight x of the eggs in the carton will be at least: _____________ b) Spook's Lane Farm claim that their chickens produce eggs that have a mean weight of x = 2.4, which is plenty large enough to be classified as Canada Grade A. You buy a carton of eggs from Spook's Lane Farm and find that the mean weight of the eggs in this carton is only x = 2.25 ounces. (Assume the weights are normally distributed with a standard deviation of x = .18) i) If their claim is true, and x really does equal 2.4, what is the probability that a random sample of 12 eggs will have a mean of x = 2.25 ounces or less, as your sample did? i) Find a 95% confidence interval for your sample mean x = 2.2 ounces. Math 120 Lecture Notes 48 Math 120 Section 6.2 Tests of Significance In this section we are introduced to the concept of hypothesis testing. Hypothesis testing is a very simple idea, but it uses formal language and terminology that might sound confusing. Recall the last example we did in section 6.1: Spook's Lane farm claim that the mean weight of eggs produced on their farm is x = 2.4 ounces. You buy a sample of 12 eggs and find that the mean weight in your sample is x = 2.24 ounces. Does this give you enough evidence to reject the farm's claim? If we assume that their claim is true, what was the probability that x = 2.24 for a sample of 12? Spook's Lane claim is called the null hypothesis, denoted H0: During the test, we assume this hypothesis is true. H0: x = 2.4 However, there is some suspicion that it is not true. In this case, we suspect that x < 2.4. The is called the alternate hypothesis, denoted Ha: Ha: x < 2.4 The test statistic is simply that value of z that x converted to. In this case the test statistic was __________ In this case the p-value is the probability of getting an actual x value that is as small or smaller than it actually was. We compute this probability under the assumption that Ho is true. In this case, the p-value is ______________ . When the p-value is quite small, we are likely to decide that H o is not true, as we did in this case. The smaller the P-value, the more likely we are to reject Ho. The above example is called a one-tailed test because our alternate hypothesis only considered the possibility that x < 2.4, rather than considering that it might be either less than or greater than 2.4. A two-tailed test would have alternate hyposthesis: Ha: x 2.4. Math 120 Lecture Notes 49 Stating the Hypotheses Null Hypothesis H0 The statement being tested in a hypothesis test is called the null hypothesis The hypothesis test is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of “no effect” or “no difference” The Alternate hypothesis, which is accepted if we reject the null hypothesis, is the statement we hope or suspect is true instead of the null hypothesis. The alternate hypothesis can be either one-sided or two-sided. Each of the following situations requires a significance test about a population mean . State the appropriate null hypothesis H0 and the alternative hypothesis Ha in each case. a) The mean area of the several thousand apartments in a new development is advertised to be 1250 square feet. A tenants group thinks that the apartments are smaller than advertised. They hire an engineer to measure a sample of apartments to test their suspicion. H0 ___________________ Ha ___________________ b) Larry’s car averages 32 miles per gallon on the highway. He now switches to a new motor oil that is advertised as increasing gas mileage. After driving 3000 highway miles with the new oil, he wants to determine if his gas mileage has actually increased. H0 ___________________ Ha ___________________ c) The diameter of a spindle in a small motor is supposed to be 5 millimeters. If the spindle is either too small or to large, the motor will not perform properly. The manufacturer measures the diameter in a sample of motors to determine whether the mean diameter has moved away from the target. H0 ___________________ Ha ___________________ Math 120 Lecture Notes 50 P-Value The probability, computed under the assumption that H0 is true, that the test statistic would take a value as extreme or more extreme than it actually was, is called the P-value of the test. The smaller the P-value, the more likely we are to reject H0 The test statistic is: z x - μ0 σ n Example of a one tailed test: Statistics can help decide the authorship of literary works. Sonnets by an Elizabethan poet are known to contain an average of = 6.9 new words (words not used in the poet’s other works). The standard deviation of the number of new words is = 2.7. Now a manuscript with n = 5 new sonnets has come to light, and scholars are debating whether it is the poet’s work. The new sonnets contain an average of x = 10 words not used in the poet’s known sonnets. We expect poems by another author to contain more new words, so to see if we have evidence that the new sonnets are not by our poet we test: Give the z test statistic and its P-value. Shade in the area on the graphs that is equal H0:___________________ to the P-value. What do you conclude about the authorship of the new poems? Ha:___________________ The distribution that Math 120 Lecture Notes x should follow if H0 is actually true 51 Exercise : If a null hypothesis is H0: x = 18, state three possible alternative hypothesis. Identify each as one-tailed or two-tailed. Exercise : For each of the following situations, formulate a null and an alternative hypothesis. Identify each test as being one-tailed or two-tailed. a) The population mean IQ is 100. A psychologist wants to test the hypothesis that the mean IQ for alcoholics is different than 100. H0________________________ Ha___________________________ b) A physician claims that the mean cost for an MRI test is less than $1000. H0________________________ Ha___________________________ c) A consumer advocate claims that the mean price for a cellular phone is more than $300. H0________________________ Ha___________________________ Exercise : Let x be the number of ounces of liquid contained in each Super-Duper can that is manufactured at a particular chemical company. Juanita Lopez, a production supervisor at the company, wants to be sure that the Super-Duper can is filled with an average of x = 16 ounces of product. If the mean volume is significantly less than 16 ounces, customers will likely complain, prompting undesirable publicity. The physical size of the can doesn't allow a mean volume significantly above 16 ounces. A random sample of 36 cans shows a sample mean of 15.8 ounces. (assume x = 0.6 ounces) If the true mean really is 16 ounces, find the p-value—the probability that x would be at least as small as it actually was. Shade in the areas corresponding to the p-value on the graphs. H0:__________________ Ha:__________________ The distribution that Math 120 Lecture Notes x should follow if H0 is actually true 52 Exercise : Let x be the number of cakes sold in a day a the Kate and Edith Cake Company. Kate Flower, President of the company, says that the mean number of cakes sold daily is x = 1,500. An employee thinks that perhaps the mean is different than 1,500. A random sample of 36 days shows that the mean daily sales were x = 1,450 cakes. If Kate’s claim is true, find the p-value: the probability that x would be at least as far from 1,500 as it was. (assume x = 120 cakes). Shade in the areas corresponding to the p-values on the graphs below. H0:__________________ Ha:__________________ The distribution that Math 120 Lecture Notes x should follow if H0 is actually true 53 Fixed Levels of Significance Recall how tests of significance work: There is a population with an unknown mean and a known standard deviation . It has been hypothesized that the mean is = 0 (this is called the null hypothesis, denoted H0). To test H0, we take a random sample of size n from the population and compute the sample mean x . We will reject H0 if x is surprisingly far away from the hypothesized mean . How do we decide whether of not x is far enough away from the hypothesized mean to justify rejecting H 0? First we compute the z test statistic: x 0 z n and we look z up in Table A (Standard normal probabilities) to find the p-value. The p-value is the probability of x being at least as far away from as it actually was. If the p-value is significantly small, we reject H0. But how small is “significantly” small? This question is answered for us if the test has a fixed level of significance (denoted ). We reject H0 if p-value < . Exercise: A company produces boxes of dog biscuits that are supposed to weigh 335 grams. The quality control department has taken a sample of 40 boxes and found that the mean weight of the boxes in the sample is 337 grams. Does this provided enough evidence to conclude that the true mean weight of all the boxes produced is different then 335 grams? (Assume it is known that the standard deviation of all the weights is = 10 grams.) Test at the = .05 level of significance. Shade in the regions corresponding to the p-value of the test. The distribution that size 40) Math 120 Lecture Notes x should follow if H0 is actually true (sample 54 Sometimes it is easier not to find the p-value. Instead, we compare the test statistic, z, to critical z* values associated with the level of significance . These z* values can be found in the second-to-last row of table C. We look for (one-tailed test) or /2 (two-tailed test) in the top row of table C. Exercise: Use Table C to find the z* critical values associated with the given levels of significance. Mark the z* values on the graph, and label the regions where we will reject H 0. Shade in the region whose area is . In each case the alternate hypothesis is given. a) = .10 , H0: = 0 Ha: > 0 This is a _________ -tailed test. z* = ________ Reject H0 if: b) = .10 , H0: = 0 , Ha: < 0 This is a _________ -tailed test. z* = ________ Reject H0 if: c) = .10 , H0: = 0, Ha: 0 This is a _________ -tailed test. z* = ________ Reject H0 if: Math 120 Lecture Notes 55 d) = .05 , H0: = 0, Ha: > 0 This is a _________ -tailed test. z* = ________ Reject H0 if: e) = .05 , H0: = 0, Ha: < 0 This is a _________ -tailed test. z* = ________ Reject H0 if: f) = .05 , H0: = 0, Ha: 0 This is a _________ -tailed test. z* = ________ Reject H0 if: Math 120 Lecture Notes 56 Exercise: According to a label on the bottle, Multivitamin Tablets contain 100 mg of vitamin C. A consumer group suspects that the tablets contain less than 100 mg of vitamin C. A random sample of 30 tablets yield x = 98.2 mg. (assume = 5). a) State the null and alternate hypotheses. b) Calculate the value of the test statistic and mark it on the graphs. c) Is this result significant at the = 5% level? d) Is this result significant at the 1% level? Exercise: One night gnomes weighed all the chocolate Easter eggs in a bulk food bin at Overwaitea. They found that the weights follow a normal distribution with standard deviation = 0.5 grams, but unfortunately they can’t quite remember the mean weight . They guess that the mean weight is 10 grams. You weight a sample of five eggs and get the following data (in grams): 10.5 10.1 9.8 11.1 10.4. Does this give you evidence to conclude that the true mean is different than 10 grams? a) State the null and alternate hypotheses. b) Calculate the value of the test statistic and mark it on the graphs. c) Is this result significant at the 10% level? d) Is this result significant at the 5% level? Math 120 Lecture Notes 57 Math 120 Section 6.3 Making Sense of Statistical Significance Example: Sometimes significance tests can give misleading or meaningless results. Several situations where this can occur are discussed in this section. In particular, this section says: 1. A statistically significant effect need not be practically important. 2. If the data was collected using a faulty technique, or if it contains outliers or errors, it cannot be redeemed by conducting significance tests. 3. Many tests run at once will probably produce some significant results by chance alone, even if all the null hypotheses are true. 1. Statistical significance and practical significance Sometimes an effect can have a very small P-value and thus be statistically significant, yet be of no practical importance. Example: A manufacturer produces cassette tapes that are supposed to play for 90 minutes. A study of 900 cassette tapes found that the average playing time for the tapes was x = 89 minutes 59 seconds (89.983 minutes.) (Assume = 0.1 in the population). Are these results statistically significant at the = .01 level? Do you think the results are of practical importance? Math 120 Lecture Notes 58 2. Statistical inference is not valid for all sets of data Formal statistical inference cannot correct basic flaws in the design of an experiment or survey. Example: A local television station announces a question for a call-in opinion poll on the six o’clock news and then gives the response on the eleven o’clock news. On one occasion, the question concerned the approval rating of Premier Glen Clark. Of the 2372 calls received, 1921 said that Glen Clark should resign. The station, following standard statistical practice, makes a confidence statement: “81% of the Channel 8 Poll sample believe Glen Clark should resign. We can be 95% confidant that the proportion of all viewers who want Clark to resign is within 1.6% of the sample result.” Is the station’s conclusions justified? Explain your answer. 3. Beware of multiple analyses If you carry out many individual tests, finding a few small P-values is not conclusive. For example, suppose you make many tests on the same set of data. Example: Three psychiatrists studied a sample of schizophrenic persons and a sample of nonschizophrenic persons. They measured 77 variables for each subject—religion, family background, childhood experiences, and so on. Their goal was to discover what distinguishes persons who later become schizophrenic. Having measured 77 variables, they made 77 separate tests of the significance of the differences between the two groups of subjects. The psychiatrists found 2 of their 77 tests significant at the 5% level and immediately published this exciting news. Discuss these results. Math 120 Lecture Notes 59 Math 120 Section 6.4 Type I and Type II Errors There are two types of errors that might occur in a hypothesis test: Type 1 Error: We reject Ho ( and accept Ha) when in fact H0 is true. Type II Error: We Accept H0 (and reject Ha) when in fact Ha is true. The truth about the population: Decision based on sample: H0 is true Ha is true Reject H0 Type 1 error Correct decision Accept H0 Correct decision Type II error Note that , the level of significance, is the probability of a Type 1 Error. In order to calculate Type II error, we must have a specific value for as an alternative hypothesis. Example: According to the label on the bottle, each tablet in the bottle contains 5 mg of Lorazepam, a sedative. (Assume the amount of Lorazepam in each tablet is normally distributed with a standard deviation of 1 mg). A random sample of 16 tablets is analyzed and found to contain an average of 5.6 mg of Lorazepam. Given the hypotheses: H0: = 5mg and Ha: = 5.6mg a) Shade in the area that is equal to = 5% , the probability of a type I error. b) Shade in the area that is equal to the probability of a type II error. c) To decrease the chance of a type I error, we should make ( smaller / larger ) To decrease the chance of a type II error, we should make ( smaller / larger ) Math 120 Lecture Notes 60 Example: Suppose that an individual complaining of stomach pains comes to a doctor’s office. Mentally, the doctor immediately formulates the hypothesis: Ho: The person has appendicitis Ha: The person has a stomachache Make a table describing the 4 possible decision/truth combinations. Which type of error is most serious in this case? Example: Ho: Offshore oil drilling does not cause significant environmental damage. Ha: Offshore oil drilling does cause significant environmental damage. Describe type 1 and type 2 error. Which type of error is worse: a) From an environmentalist point of view? b) From an advocate of economic growth’s point of view? Example: A jury is supposed to convict an individual on trial for a string of murders and assaults if the individual is guilty beyond a “reasonable” doubt. The word “reasonable” is related to the numeric value of in testing the hypothesis H0: The individual is innocent Ha: The individual is guilty a) To avoid convicting an innocent person, should the jury choose a high value for or a low value? b) To avoid releasing a guilty person to possibly continue the wave of attacks, should the jury select a high value for or a low value? Math 120 Lecture Notes 61 Math 120 Objectives and Suggested Exercises Section 6.1 Estimating With Confidence 1. To understand that the purpose of a confidence interval is to estimate an unknown population parameter. 2. To understand that a confidence interval has two parts: The interval: estimate x margin of error The confidence level: the probability that the method will give a confidence interval that contains the actual population parameter. Pg. 303 #6.1 3. To be able to calculate a confidence interval for a given confidence level, according to the formula x z * n Pg. 310 #6.8 a) b) 4. To understand that when the confidence level increases, the margin of error (and hence the width of the interval) must get larger. (Think of the confidence interval as a net that we use to try to capture . We need a larger net to be more confident that we will actually capture .) Pg. 310 # 6.8 c) 5. To be able to calculate the sample size needed to produce a desired 2 z * margin of error, according to the formula n . m Pg. 312 # 6.11 Math 120 Lecture Notes 62 Math 120 Objectives and Suggested Exercises Section 6.2 Test of Significance 1. To understand that a test of significance is intended to assess whether or not the data provides enough evidence to reject the null hypothesis in favour of the alternate hypothesis. 2. To understand that the null hypothesis is usually a statement that there is “no difference” or “no change” , while the alternate hypothesis is a statement of the effect we suspect or hope might be present. Pg. 325 # 6.27, 6.29 3. To understand the difference between one-tailed and two-tailed tests: One tailed: Ha: > 0 O R Ha: < 0 Two tailed: Ha: 0 4. To be able to calculate the z-test statistic using the formula z x - 0 , n and hence to compute the p-value, the probability (computed under the assumption that the null hypothesis is true) of x being at least as extreme as it actually was. Pg. 332 # 6.34 (one-tailed) Pg. 333 #6.35 (two-tailed) 5. To understand that the smaller the P-value is, the more evidence there is to reject H0, and if the P-value is less than the level of significance (), then we say the data are statistically significant at significance level . Pg. 342 # 6.52, pg. 337 #6.37. Math 120 Lecture Notes 63