Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Central Limit Theorem (CLT) is stated as follows: Given a large random sample of size n from a population with mean and standard deviation , then The sample mean X is approximately normally distributed with mean and standard deviation given by = X = / n X Note: 1. n >30 is usually large enough for the CLT to apply. 2. If the population from which we sample is normal thenX is exactly normally distributed with mean and standard deviation as above for any sample size. (1) Empirical Rule for X Consider a sample of size n from a population with mean and standard deviation . Suppose X is normal ( or approximately normal), with = and = /n X X (This would be the case if the population is normal or if the sample size is large). Find the probability that X will be within (a) 2 of (b) 3 of . X X (a) P( X will be within 2 X of ) = (2) (b) P( X will be within 3 of ) X = In general the statement “X will be within k of “ means that X lies between X -k X and +k X If X is normal ( or approximately normal), then P( X will be within k of ) = P(-k < Z <k) X (3) Z Confidence Interval Suppose we are given the following: Normal Population: Scores on a standardized test. Population Mean : (unknown) Population S.D.: =1.5 To estimate we will take a srs of size n =25 and use X as our estimator. Recall that since the population is normal, X is normally distributed with = and = /n = 1.5/5 =.3 X X We would like to be able to express this estimate in the form X E or (X – E, X + E ). Here E is some error which determines the accuracy of our estimate. Let’s take E = 2 for now . X Thus we have For any given sample this interval may or may not contain the true mean . It would be useful to know what the probability is that this interval covers . If the interval covers the true mean then is somewhere in the interval above so thatX is in fact within 2 ( =0.6) of . X Thus P [ (X - 2 , X + 2 ) covers ] X X = P (X is within 2 of ) X = = (4) To make the probability above a nice number, .95, we should replace 2 by 1.96. Thus we can say “ For 95% of all samples of size n =25, the interval (X - 1.96 , X + 1.96 ) X X will cover the true value of .” Or, “ For 95% of all samples of size n =25, X will be within 1.96 of the true X population mean .” The 95% value is called the LEVEL OF CONFIDENCE. This tells us the probability the interval will cover . The 1.96 = .588 is called the margin of error. This tells us how accurate X is X (i.e. how closeX will be to for 95% of all samples). The interval (X - 1.96 , X + 1.96 ) is called a 95% X X Z-CONFIDENCE INTERVAL. The simulation below will illustrate how confidence intervals work. (5) MTB > random 25 c1-c40; SUBC> norm 10 1.5. MTB > zint 95 1.5 c1-c40. [ The first two command lines select 40 random samples each of size n =25 from a normal distribution with =10 and = 1.5. The third command line forms the 95% Z-CONFIDENCE INTERVAL for each sample] Confidence Intervals (The assumed sigma = 1.5) Variable C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 N 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 Mean 10.459 9.826 10.388 9.741 10.441 10.331 8.941 10.205 10.163 10.009 10.455 10.365 10.626 10.090 10.339 10.208 10.356 9.943 10.015 9.924 10.037 9.490 9.972 10.330 9.635 9.292 10.053 9.484 10.666 9.896 9.942 10.100 9.483 9.691 10.390 10.569 9.813 9.905 10.442 9.945 StDev 1.661 1.486 1.600 1.297 1.766 1.637 1.264 1.627 1.560 1.619 1.787 1.220 1.475 1.677 1.103 1.480 1.508 1.388 1.318 1.473 1.271 1.345 1.484 1.644 1.609 1.558 1.072 1.726 1.402 1.640 1.583 1.657 1.496 1.623 1.369 1.178 1.326 1.489 1.405 1.919 (6) SE Mean 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( 95.0% CI 9.871, 11.047) 9.238, 10.414) 9.800, 10.976) 9.153, 10.329) 9.853, 11.029) 9.743, 10.919) 8.353, 9.529) 9.617, 10.793) 9.575, 10.751) 9.421, 10.597) 9.867, 11.043) 9.777, 10.953) 10.038, 11.214) 9.502, 10.678) 9.751, 10.927) 9.620, 10.796) 9.768, 10.944) 9.355, 10.531) 9.427, 10.603) 9.336, 10.512) 9.449, 10.625) 8.902, 10.078) 9.384, 10.560) 9.742, 10.918) 9.047, 10.223) 8.704, 9.880) 9.465, 10.641) 8.896, 10.072) 10.078, 11.254) 9.308, 10.484) 9.354, 10.530) 9.512, 10.688) 8.895, 10.071) 9.103, 10.279) 9.802, 10.978) 9.981, 11.157) 9.225, 10.401) 9.317, 10.493) 9.854, 11.030) 9.357, 10.533) QUESTIONS 1. (a) In theory, how many of the above intervals would you expect to cover the true population mean (=10)? (b) In fact how many actually do? 2. Suppose you selected 40 samples of size n =25 from a real population ( where typically the population mean and standard deviation are unknown). (a) Could you form a 95% Z- confidence interval for each sample? Explain. (b) If you knew and formed forty 95% Z-confidence intervals, how many of the intervals would you expect to cover the population ? Could you tell which? Explain. (7) Note: (i) 100(1-)% Z-confidence interval of is given by X Z/2 ; where X = /n X (ii) For 95% Z –confidence interval , = .05. hence 95% Z-confidence interval of is X 1.96 ; X where = /n X (iii) 99% Z-confidence interval of is X 2.576 ; X where = /n X (iv) 90% Z-Confidence Interval of is X 1.645 ; X where = /n X (8) The t-distribution The t-distribution depends on a single parameter. This parameter is called its degrees of freedom (df). If sampling is done from a normal distribution whose mean is and standard deviation , then X - Z = /n follows standard normal distribution. Since, in practice is mostly unknown; therefore, we can replace it by its estimate s. The random variable X - T = S /n follows t-distribution with n-1 degrees of freedom. Sketch of t-distribution In comparison with standard normal distribution, the tdistribution has more area in the tails while the standard normal distribution has more area in the middle. t-curve approaches Z-curve if df is large. (9) T-Interval: Confidence Interval for the Mean of a Normal Population ( unknown) If a random sample X1 , X2 . . . Xn is chosen from a normal distribution; then 100(1-)% Confidence Interval of is X t/2 SE where: df for t is n-1, SE = s/n = standard error of X ( the estimated sd of X), X = s2 = s= Margin of Error: E = t/2 SE = t/2 s/n Level of Confidence ( Reliability) : 100(1-)% Notes: 1. For all n, t/2 > z/2 . 2. For df = , t/2 = z/2 , which are the entries at the bottom of the t –table. 3. For large n (n >30), the normality assumption may be ignored because of the Central Limit Theorem. 4. The estimate of , X is the mid-point of the CI and the margin of error is one half the width of the CI. L X U Thus, X = (L+U)/2 (10) and E = (U – L)/2 Example: In a health study the birth weights of a random sample of 100 newborns from mothers with a low socioeconomic status in a large US city was recorded. The sample yielded a mean of 3.21 kg with a standard deviation of 0.71 kg. (a) Find a 90% confidence interval for the true mean birth weight of newborns from mothers with a low socioeconomic status. (b) Interpret the confidence interval. Solution: Here we wish to estimate = mean birth weight of all newborns from mothers with a low socioeconomic status in this US city. Given: n= x = [estimate of ] s= [estimate of ] Since n > 30, it is not necessary that the population be normal ( due to the CLT). For a 90% CI, t/2 = = , df = n –1 = 99 x t/2 s/n = = or, (c) x = _________ estimates the true population mean with margin of error E =____________ and level of confidence (Reliability)____________. The level of confidence gives the proportion of intervals found this way that would cover . (11) Note: The interpretation of a confidence interval as given in the example above is the popular interpretation often heard on television or reported in newspapers. A mathematically precise interpretation of the confidence interval for this example would be “ Prior to sampling there was a .90 probability that the confidence interval to be formed would contain the true population mean “. Example: For the data in the example above, find a 95% confidence interval for the true mean birth weight of newborns from mothers with a low socioeconomic status. Solution: Recall, n = 100, x = 3.21, For a 95% CI, t/2 = s = 0.71 . = , df = n –1 = 99 x t/2 s/n = = or, Interpretation: x = _________ estimates the true population mean with margin of error E =____________ and level of confidence (Reliability)____________. (12) Example: For the data in the example above, find a 99% confidence interval for the true mean birth weight of newborns from mothers with a low socioeconomic status. Solution: Recall, n = 100, x = 3.21, For a 99% CI, t/2 = s = 0.71 . = , df = n –1 = 99 x t/2 s/n = = or, Interpretation: x = _________ estimates the true population mean with margin of error E =____________ and level of confidence (Reliability)____________. Question: Considering these three examples, if the level of confidence is increased and all other things remain the same, the width of the confidence interval will_______________ . (13) Example: A study was conducted to determine the effect of acid rain on the lake water in an industrial region of the country. The data below gives the pH levels from a random sample of 10 lakes from this region. ( It was assumed that the sample came from a normal distribution). Minitab was used to find a 95% confidence interval for the mean pH level for all lakes in this region. C1: 6.6 7.1 7.3 6.7 6.8 6.2 6.5 5.9 6.9 6.3 MTB > tint 95 c1 One-Sample T: C1 Variable C1 N 10 Mean 6.630 StDev 0.424 SE Mean 0.134 ( 95.0% CI 6.326, 6.934) From the Minitab output answer the following: (a) What is the 95% confidence interval of ? (b) What is the estimate of and the estimated standard deviation of this estimate? (c) What is the margin of error E and level of confidence (reliability) for the estimate of ? (14) Sample Size Determination for Estimating Problem: Suppose you wish to estimate a population mean with a specified margin of error E and level of confidence. What sample size should be used? Solution: We know that E = t/2 s/n . Now we solve this equation for n. E2 = nE2 = n= = [t/2 s/E]2 Of course since we have not sampled yet we do not have values for s or t/2 . In practice t/2 is replaced by z/2 and s is replaced by a prior estimate . Thus n [z/2 / E]2 , rounded up to the next whole number. Example: How large a sample would be required to estimate the mean pH level for all lakes in the industrial region to within .1 with level of confidence 95%. Assume that prior estimate for is 0.424. (15) STATISTICAL INFERENCE Let us begin with a review of some basic definitions. POPULATION: The set of all measurements or objects of interest in a particular study. If the entire population were available for analysis we would know everything about it. However, in practice one cannot know the entire population because it is either too expensive, or simply impossible or impractical to examine each member. Thus a sample from the population is used to obtain information about the population. SAMPLE: A subset of the population. The sample picked should be “representative” of the population from which it comes and should avoid any bias which might skew our view of the population. One way to achieve this is to use a SIMPLE RANDOM SAMPLE (srs) i.e. a sample chosen in such a way that each member of the population has an equal chance of being chosen. INFERENTIAL STATISTICS: deals with procedures which use the sample to draw conclusions about the population (from which it was drawn). The procedures of interest to us are CONFIDENCE INTERVALS and HYPOTHESIS TESTS. In particular we will be interested in drawing conclusions about certain characteristics of the population. Such characteristics are known as POPULATION PARAMETRS. Examples of such characteristics are a POPULATION MEAN (denoted by the Greek letter ) and a POPULATION PROPORTION ( denoted by the letter p). EXAMPLE: Consider the population of weights ( in kg) of all newborn babies in Canada for a particular year. In this case, the POPULATION MEAN is the average weight of all newborns in the population. An investigator may want to use a simple random sample of these weights to determine if there is sufficient evidence to answer questions like: Is > 3.2 kg? or Is < 3.2 kg? or Is 3.2 kg? EXAMPLE: Consider the population of all lakes in Nova Scotia. A biologist may be interested in the following POPULATION PROPORTION: p = the proportion of all lakes in Nova Scotia that are seriously affected by acid rain. She may want to use a simple random sample of lakes from this population to determine if there is sufficient evidence to answer questions like: Is p>.7 ? or, Is p<.7? or, Is p.7 ? When drawing conclusions about a population using information from a sample it is important to realize that one can NEVER be absolutely certain the conclusion is correct. This is because a sample, though it may be “representative” of the population, only contains part of all the information contained in the population. (16) HYPOTHESIS TESTING Example: A graduate student claims that over 70% of the lakes in Nova Scotia have been seriously affected by acid rain. To justify this claim she proposes the following `test`. “ Choose a simple random sample of 15 lakes in Nova Scotia. If 11 or more of the sampled lakes are seriously affected by acid rain, the claim is justified.” Formally, we set up this test as follows. First notice that the population of interest to this graduate student is the set of all lakes in Nova Scotia. The parameter of interest in her investigation is p=the true proportion of all lakes in Nova Scotia affected by the acid rain [p=the unknown population proportion]. NULL HYPOTHESIS ALTERNATIVE HYPOTHESIS What we want to reject. The viewpoint opposite to Ha Research Hypothesis. What we want to prove. H0 : Ha : TEST STATISTIC ( evidence from the sample used to make a decision) X= Distribution of X : Now in conducting this test we should make use of the fact that large values of X would be consistent with the _______________________ hypothesis that p .7. How large an X? Let’s pick some number c and decide that if Xc we conclude that_____________. Thus if X<c we must conclude that___________. The value c is called a CRITICAL VALUE . In this example, the graduate student has decided to use c =11. Her method for making a decision can be described as follows. (17) REJECTION OR CRITICAL REGION (rule for making a decision) Now suppose she conducts her study and that she observes that X 11. Then she would claim to have shown that Ha : p > .7 is true. If you had to use her study to make a policy decision, the first question you should ask is “ What is the probability that her claim is wrong? That is, what is the probability of getting X 11 when in fact H0 : p .7 is true ?” Let’s find out by doing the calculations below. Suppose that H0 : p .7 is true p = .5 Probability of a wrong decision P(Reject H0 / H0 is true) P(X 11 p =.5) = 1 – P (X 10) = = p=.6 P ( X 11 p =.6) = 1 – P ( X 10) = p=.7 = P(X 11 p = .7) = 1 – P(X 10) = = The error of rejecting H0 when in fact H0 is true is called a TYPE 1 ERROR. Notice that in this example the largest probability of making a type 1 error is _____________ and that it occurs when the value of p is _____________ ( that is on the boundary between H0 and Ha). The largest probability of making a type 1 error is called the LEVEL OF SIGNIFICANCE or TYPE 1 ERROR RATE of the test and is denoted by the Greek letter . (18) Conversely suppose that the graduate student observed X < 11 (i.e. X10), thus leading to the claim H0 : p .7 is true. In this case you should ask “ What is the probability that her claim is wrong? That is, what is the probability of getting X < 11 when in fact Ha : p > .7 is true?” Let’s find out by doing the calculations below. Suppose that Probability of a wrong decision Probability of a correct Ha: p>.7 is true P (Accept H0 Ha true) decision P (Reject H0Ha true) p=.8 P(X< 11p=.8) P(X11 p=.8) =P(X 10) = 1 – P( X 10) = = = p=.9 P( X< 11 p =.9) P(X 11 p =.9) = P (X 10) =1- P(X 10) = = = The error of accepting H0 when in fact Ha is true is called a TYPE II ERROR. For a particular value of p say p1 in the alternative ( i.e. p1 >.7) the probability of making a type II error is called the TYPE II ERROR RATE evaluated at p = p1. This probability is denoted by (p1). Thus, (p1) = P ( Accept H0 p = p1 in Ha) Also for a particular value of p say p1 in the alternative (i.e. p1 > .7) we can calculate the probability of a correct decision ( see the last column of the table above). The probability of making a correct decision, that is, rejecting H0 when in fact Ha is true is called the POWER OF THE TEST AGAINST THE ALTERNATIVE p1 in Ha and is denoted K(p1). Thus K(p1) = P (Reject H0 p = p1 in Ha) Notice that K(p1) and (p1) are related by K(p1) = 1 - (p1). If in fact Ha is true, power is a measure of a test’s ability to detect this. For example if in fact p were actually .8(.9), this test will detect this with probability___________(__________). (19) A good test that is one in whose results we can be confident of , will be one in which the probabilities of the type I and type II errors are small. The ideas discussed above are refer to the ERROR STRUCTURE of a test. A summary is provided below. DECISION Accept H0 ( Do not reject H0) Reject H0 ( Accept Ha) ACTUAL SITUATION H0 is True Ha is True ( H0 is false) Correct Decision Type II Error Type I Error Correct Decision QUESTION: For the student’s test above, state in words the consequence of making a (a) Type I Error: (b) Type II Error: ERROR RATES AND POWER OF A TEST TYPE I ERROR Reject H0 when H0 is true TYPE II ERROR Accept H0 when Ha is true POWER AGAINST the ALTERNATIVE p1 P(Type I Error) = P (Reject H0 H0 true). The largest possible probability of a type I error is denoted by and is called the LEVEL OF SIGNIFICANCE or TYPE I ERROR RATE of the test. In calculating = P ( reject H0 H0 true ) , use the value of p right on the boundary between H0 and Ha (p1) = P (Type II Error) = P ( Accept H0 p = p1 in Ha) K(p1) = P ( Reject H0 p = p1 in Ha ) = 1 - (p1) In the case that Ha is true, power is a measure of the sensitivity of the test i.e. the ability of the test to detect that Ha is true. (20) Changing the Rejection Region Question: If we use the same sample size, how can we modify this test in order to reduce the type I error rate ? Suppose we take c =14, so we reject H0 if X 14. What is ? In this case what will happen to the type II error rate (p1) and the power K(p1) ? NOTE: Ideally, we would like and (p) to be zero and K(p) to be 1; but for fixed n decreasing causes (p) to increase and K(p) to decrease. NOTE: The only way to decrease both and (p) is to increase the sample size. (21) The P-value Consider the test: H0: p .70, Ha: p > .7, n=30; Reject H0 if X 26. Suppose we conduct the test and observe X to be x0 = 28. According to the rejection region we would reject H0 . We would in fact have rejected H0 even if our critical value had been 28. But with a critical value of 28, the type I error rate would be smaller. The P-value is the smallest type I error rate at which one can reject H0 on the basis of the observed outcome x0 . It is obtained by replacing the critical value ‘c’ by x0 in the calculation of the type I error rate. P-value = P (X x0 H0 is true) For example, consider the cases where x0 is 28 and x0 is 24. Type I error rate P(X26 p =.7) P-value when x0 = 28 P ( X 28 p =.7) P-value when x0 = 24 P(X 24 p =.7) = 1 – P (X 25 p = .7) = 1 – P (X 27 p =.7) = 1 – P ( X 23 p =.7) = 1 - .9698 = 1- .9979 = 1 - .8405 =.0302 =.0021 = .1595 Notice If x0 is in the rejection region the p-value . If x0 is not in the rejection region then the p-value is > . Thus it is clear that we can conduct our test at = .03 without using a rejection region. We just have to calculate the P-value and use the following rule. If the P-value then reject H0 . If the P-value > then do not reject H0. (22) Summary: Hypothesis Testing Concept Left-Tailed Test Right-Tailed Test Hypotheses H0: p p0 , Ha : p < p0 H0: p p0 , Ha : p > p0 Critical Region Reject H0 if X c Reject H0 if X c Type I Error Rate P(Reject H0H0 true) Type II Error Rate (p1) P(Accept H0 p =p1 in H a) Power K(p1) P(Reject H0 p =p1 in Ha) P-Value P(Xc p =p0) P(X c p = p0) P(X>c p = p1) P(x<c p =p1) P(Xcp=p1) or, 1-(p1) P(Xx0 p =p0) P(Xc p= p1) or, 1 -(p1) P(Xx0 p =p0) P-value Decision Rule Reject H0 if the P-value Note: A similar theory also applies to a Two-tailed test, i.e., a test of H0: p =p0, Ha: p p0 While we will conduct such tests in our applications, we will not discuss the theory here. (23) An analogy of statistical hypotheses In practice we use = .01 or = .05. Thus to reject H0 we need strong evidence. In our judicial system, we use the phrase innocent until proven guilty beyond a reasonable doubt. We may define null and alternative hypotheses as follows: H0: defendant is innocent Ha: defendant is guilty. To prove defendant is guilty we need strong evidence. (24) PROBLEM: Given: Ha : p <.6 , n=30; Reject H0 if X 13 (a) Find the level of significance . (b) Find (p1) if in fact p1 = .4. (c) Find the power against the alternative p1 = .4. (d) Suppose that X is observed to be x0 = 12 (i) What is your decision? (ii) What type of error are you subject to? (iii) Find the p-value. (25) PROBLEM: Given: Ha : p >.4 , n=20; Reject H0 if X 16. (a) Find the level of significance . (b) Find (p1) if in fact p1 = .6. (c) Find the power against the alternative p1 = .6. (d) Suppose that X is observed to be x0 = 14 (i) What is your decision? (ii) What type of error are you subject to? (iii) Find the p-value. (26) CUMULATIVE BINOMIAL PROBABILITIES : P(Xx) n 15 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 30 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0.1 .2059 .5490 .8159 .9444 .9873 .9978 .9997 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .1216 .3917 .6769 .8670 .9568 .9887 .9976 .9996 .9999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .0424 .1837 .4114 .6474 .8245 .9628 .9742 .9922 .9980 .9995 .9999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.2 .0352 .1671 .3980 .6482 .8358 .9389 .9819 .9958 .9992 .9999 1.000 1.000 1.000 1.000 1.000 1.000 .0115 .0692 .2061 .4114 .6296 .8042 .9133 .9679 .9900 .9974 .9994 .9999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .0012 .0105 .0442 .1227 .2552 .4275 .6070 .7608 .8713 .9389 .9744 .9905 .9969 .9991 .9998 .9999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.3 .0047 .0353 .1268 .2969 .5155 .7216 .8689 .9500 .9848 .9963 .9993 .9999 1.000 1.000 1.000 1.000 .0008 .0076 .0355 .1071 .2375 .4164 .6080 .7723 .8867 .9520 .9829 .9949 .9987 .9997 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .0000 .0003 .0021 .0093 .0302 .0766 .1595 .2814 .4315 .5888 .7304 .8407 .9155 .9599 .9831 .9936 .9979 .9994 ..9998 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.4 .0005 .0052 .0271 .0905 ..2173 .4032 .6098 .7869 .9050 .9662 .9907 .9981 .9997 1.000 1.000 1.000 .0000 .0005 .0036 .0160 .0510 .1256 .2500 .4159 .5956 .7553 .8725 .9435 .9790 .9935 .9984 .9997 1.000 1.000 1.000 1.000 1.000 .0000 .0000 .0000 .0003 .0015 .0057 .0172 .0435 .0940 .1763 .2915 .4311 .5785 .7145 .8246 .9029 .9519 .9788 .9917 .9971 .9991 .9998 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.5 .0000 .0005 .0037 .0176 .0592 .1509 .3036 .5000 .6964 .8491 .9408 .9824 .9963 .9995 1.000 1.000 .0000 .0000 .0002 .0013 .0059 .0207 .0577 .1316 .2517 .4119 .5881 .7483 .8684 .9423 .9793 .9941 .9987 .9998 1.000 1.000 1.000 .0000 .0000 .0000 .0000 .0000 .0002 .0007 .0026 .0081 .0214 .0494 .1002 .1808 .2923 .4278 .5722 .7077 .8192 .8998 .9506 .9786 .9919 .9974 .9993 .9998 1.000 1.000 1.000 1.000 (27) 0.6 .0000 .0000 .0003 .0019 .0093 .0338 .0950 .2131 .3902 .5968 .7827 .9095 .9729 .9948 .9995 1.000 .0000 .0000 .0000 .0000 .0003 .0016 .0065 .0210 .0565 .1275 .2447 .4044 .5841 .7500 .8744 .9490 .9840 .9964 .9995 1.000 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0002 .0009 .0029 .0083 .0212 .0481 .0971 .1754 .2855 .4215 .5689 .7085 .8237 .9060 .9565 .9828 .9943 .9985 .9997 1.000 1.000 0.7 .0000 .0000 .0000 .0001 .0007 .0037 .0152 .0500 .1311 .2784 .4845 .7031 .8732 .9647 .9953 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0003 .0013 .0051 .0171 .0480 .1133 .2277 .3920 .5836 .7625 .8929 .9645 .9924 .9992 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0002 .0006 .0021 .0064 .0169 .0401 .0845 .1593 .2696 .4112 .5685 .7186 .8405 .9234 .9698 .9907 .9979 .9997 0.8 .0000 .0000 .0000 .0000 .0000 .0001 .0008 .0042 .0181 .0611 .1642 .3518 .6020 .8329 .9648 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0006 .0026 .0100 .0321 .0867 .1958 .3704 .5886 .7939 .9308 .9885 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0009 .0031 .0095 .0256 .0611 .1287 .2392 .3930 .5725 .7448 .8773 .9558 .9895 0.9 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0003 .0022 .0127 .0556 .1841 .4510 .7941 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0004 .0024 .0113 .0432 .1330 .3231 .6083 .8784 1.000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0005 .0020 .0078 .0258 .0732 .1755 .3526 .5886 .8163 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 z Test for a Population Mean To test the hypothesis H0 : = 0 based on an SRS of size n from a population with unknown mean and known standard deviation , compute the test statistic _ z = ( x 0 ) / n In terms of a standard normal random variable Z, the P-value for a test of H0 against Ha : > 0 is P(Z z) Ha : < 0 is P(Z z) Ha : 0 is 2P(Z z) These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases (Page 445-Text Book). (28) To illustrate the test we consider the following problem. Problem 6.45 (Page 455): The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures the motivation, attitude toward school, and study habits of students. Scores range from 0 to 200. The mean score for U.S. college students is about 115, and the standard deviation is about 30. A teacher who suspects that older students have better attitudes toward school gives the SSHA to 20 students who are at least 30 years of age. Their mean score is x = 135.2. (a) Assuming that = 30 for the population of older students, carry out a test of H0 : = 115, Ha : > 115. Report the P-value of your test, and state your conclusion clearly. (b) Your test in (a) required two important assumptions in addition to the assumption that the value of is known. What are they? Which of these assumptions is most important to the validity of your conclusion in (a). Solution: Given: n= , x = , = Assume = .05 Ha : > 115; therefore this is right sided test. (a) (i) (ii) The test statistic in this case is z= (iii) p-value = P (Z 3.01 ) = (iv) Decision: (29) (v) Concluding Sentence: (b) Assumptions: (i) (ii) Note: For the above problem, let the alternative hypothesis be as follows: (i) Ha : < 145; = .05. Therefore, this is a left sided test. (ii) The test statistic in this case is z= (iii) p-value = P ( Z -1.46 ) = (iv) Decision: (v) Concluding Sentence: (30) Note: Let the alternative hypothesis now be (i) Ha : 145; = .05. Therefore, this is a two sided test. (ii) The test statistic is z = (iii) p –value = = = = (iv) Decision: (v)Concluding Sentence: (31) The One-Sample t Test Suppose that an SRS of size n is drawn from a population having unknown mean . To test the hypothesis H0 : = 0 based on an SRS of size n, compute the one-sample t statistic t= x 0 s/ n In terms of a random variable T having the t(n-1) distribution, the P-value for a test of H0 against Ha : > 0 is P(T t) Ha : < 0 is P(T t) Ha : 0 is 2P(T t) These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases (Page 496-Text Book). (32) To illustrate the test we consider the following example. Example: A random sample of 120 high school graduates were given an IQ test. The sample mean IQ was 103.21 with a standard deviation of 16.18. Test at = .10 if there is sufficient evidence to conclude that the mean of population from which the sample comes exceeds 100. Solution: Given: n =120, x = 103.21, s =16.18; = .10 (i) Ha: > 100 (ii) t= (iii) p- value = (iv) Decision: (v) Concluding Sentence: (33) Example: A psychological test, used to assess an individual’s ability to appraise other people, was given to a random sample of 12 supervisors in a large corporation. Their scores are given below. 64 97 73 71 68 74 60 78 60 74 73 75 Is there sufficient evidence at = .05 to conclude that the mean score for the population of supervisors is below 75? Solution: Given: n= ,x = , s= (i) Ha : < 75; therefore, this is a left sided test. (ii) t = (iii) p-value = (iii)Decision: (iv) Concluding Sentence: (34) , = .05 Example: A manufacturing process is supposed to produce ball bearings for use in industry with a diameter of 2cm. A random sample of 40 ball bearings was chosen and their diameters were measured. Mean and standard deviation of this random sample is given below; n =40 , x = 1.9991, s = .0089. Test the hypothesis Ha : 2 at = .05. (i) Ha : 2; therefore, this is a two sided test. (ii) t = (iii)p-value = (iv) Decision: (v) Concluding Sentence: (35) NORMAL APPROXIMATION FOR COUNTS AND PROPORTIONS An srs of size n is drawn from a population having population proportion p of successes. Let X be the number of successes in the sample and pˆ X n is the sample proportion of successes. If n is large; then X is approximately N (np, p̂ is approximately N (p, np(1 p) ) p(1 p) ) n Note: As a rule of thumb we will use the above approximation if np10 and n(1-p)10. Note: The above result is on Page 376 in the text book. (36) Large-Sample Significance test for a Population Proportion Draw an SRS of size n from a large population with unknown proportion p of successes. To test the hypothesis H0: p = p0 , compute the z statistic z= pˆ p 0 p 0 (1 p 0 ) n In terms of a standard normal random variable Z, the approximate P-value for a test of H0 against Ha : p > p0 is P(Z z) Ha : p < p0 is P(Z z) Ha : p p0 is 2P(Z z) In practice we will use this test if np0 > 10 and n(1-p0)>10. This test is given on Page 575 in the Text Book. (37) Pr oblem8.20 A matched pairs experiment compares the taste of instant versus Page585 fresh-brewed coffee. Each subject tastes two unmarked cups of coffee, one of each type, in random order and states which he or she prefers. Of the 50 subjects who participate in the study, 19 prefer the instant coffee. Let p be the probability that a randomly chosen subject prefers freshly brewed coffee to the instant coffee. (In practical terms, p is the proportion of the population who prefer fresh-brewed coffee.) (a) Test the claim that a majority of people prefer the taste of fresh-brewed coffee. Report the z statistic and its p-value. Is your result significant st the 5% level? What is your practical conclusion? (b) Find a 90% confidence interval for p. (38) Continued: (39) 100(1-)% CONFIDENCE INTERVAL FOR p For large n, p̂ is approximately N (p, (-z/2 < p(1 p) ). Therefore, n pˆ p p (1 p ) n < z/2) = 1- A simple mathematical calculation would show that the above equation is equivalent to the following P( p̂ -z/2 p(1 p) < p < p̂ +z/2 n p(1 p) ) = 1- n The standard deviation of p̂ is given by p̂ = p(1 p) n Since, p in practice is unknown; therefore, we replace it by its estimate p̂ and define standard error of sample proportion as follows: SE p̂ = pˆ (1 pˆ ) n An approximate 100(1-)% Confidence Interval for p is given by p̂ z/2 SE p̂ Note: In practice we will use this formula if both n p̂ 10 and n(1- p̂ )10. (40)