Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
1. Estimating a Population mean: Large samples: Large-sample 100% confidence interval for a population mean, where is the z-value that locates an area of to its right, is the standard deviation of the population from which the sample was selected, n is the sample size, and is the value of the sample mean. Assumption: n 30 [When the value of is unknown, the sample standard deviation s may be used to approximate in the formula for the confidence interval. The approximation is generally quite satisfactory when n 30.] Example: Suppose that in the previous year all graduates at a certain university reported the number of hours spent on their studies during a certain week; the average was 40 hours and the standard deviation was 10 hours. Suppose we want to investigate the problem whether students now are studying more than they used to. This year a random sample of n = 50 students is selected. Each student in the sample was interviewed about the number of hours spent on his/her study. This experiment produced the following statistics: = 41.5 hours s = 9.2 hours Estimate , the mean number of hours spent on study, using a 99% confidence interval. Interpret the interval in term of the problem. Solution The general form of a large-sample 99% confidence interval for is or (38.14, 44.86). We can be 99% confident that the interval (38.14, 44.86) encloses the true mean weekly time spent on study this year. Since all the values in the interval fall above 38 hours and below 45 hours, we conclude that there is tendency that students now spend more than 6 hours and less than 7.5 hours per day on average (suppose that they don't study on Sunday). 2. Estimating a population mean, Small samples Assumption required for estimating based on small samples (n < 30) Small-sample confidence interval for where the distribution of t based on (n - 1) degrees of freedom. Example: Determine the t-value that would be used in constructing a 95% confidence interval for based on a sample of size n = 14. Solution For confidence coefficient of .95, we have We require the value of t.025 for a t-distribution based on (n - 1) = (14 - 1) = 13 degrees of freedom. In t-table, at intersection of the column labeled t.025 and the row corresponding to df = 13, we find the entry 2.160 (see Figure 7.6). Hence, a 95% confidence interval for , based on a sample of size n = 13 observations, would be given by 3. Determining sample size required to estimate u ( s.d. known and not known) Choosing the sample size for estimating a population mean to within d units with probability (Note: The population standard deviation will usually have to be approximated.) Choosing the sample size for estimating a population proportion p to within d units with probability where p is the value of the population proportion that we are attempting to estimate and q = 1 - p. (Note: This technique requires previous estimates of p and q. If none are available, use p = q = .5 for a conservative choice of n.) 4. Estimating a population proportion Large-sample 100% confidence interval for a population proportion, p where is the sample proportion of observations with the characteristic of interest, and . Example: A commission on crime is interested in estimation the proportion of crimes to firearms in an area with one of the highest crime rates in a country. The commission selects a random sample of 300 files of recently committed crimes in the area and determines that a firearm was reportedly used in 180 of them. Estimate the true proportion p of all crimes committed in the area in which some type of firearm was reportedly used. Then construct a 95% confidence interval for p, the population proportion of crimes committed in the area in which some type of firearm is reportedly used. Solution A logical candidate for a point estimate of the population proportion p is the proportion of observations in the sample that have the characteristic of interest (called a "success"). This is called this sample proportion (read "p hat"). In this example, the sample proportion of crimes related to firearms is given by =180/300=.60 That is, 60% of the crimes in the sample were related to firearms; the value servers as our point estimate of the population proportion p. For a confidence interval of .95, we have ; ; ; and the required z-value is z.025 = 1.96. We obtained . Thus, . Substitution of these values into the formula for an approximate confidence interval for p yields or (.54, .66). Note that the approximation is valid since the interval does not contain 0 or 1. We are 95% confident that the interval from .54 to .66 contains the true proportion of crimes committed in the area that are related to firearms. That is, in repeated construction of 95% confidence intervals, 95% of all samples would produce confidence interval that enclose p. 5. Estimation of population variance: A (1 - )100% confidence interval for a population variance, 2 where , and are values of 2 that locate an area of /2 to the right and /2 to the left, respectively, of a chi-square distribution based on (n - 1) degrees of freedom. Assumption: The population from which the sample is selected has an approximate normal distribution. Example: There was a study of contaminated fish in a river. Suppose it is important for the study to know how stable the weights of the contaminated fish are. That is, how large is the variance 2 in the fish weights? The 144 samples of fish in the study produced the following summary statistics: Use this information to construct a 95% confidence interval for the true variation in weights of contaminated fish in the river. Solution: For a 95% confidence interval, (1 - ) = .95 and /2 = .05/2 = .025. Therfore, we need the tabulated values 2.025, and 2.975 for (n - 1) = 143 df. Looking in the df = 150 row of chi^2 table (the row with the df values closest to 143), we find 2.025 = 185.800 and 2.975 = 117.985. Substituting into the formula given in the box, we obtain We are 95% confident that the true variance in weights of contaminated fish in the river falls between 109,156.8 and 171,898.4. Figure 7.11 The location of 21-/2 and 2/2 for a chi-square distribution 6. Finding value of test statistic z Z is the standard normal variate written as: z we define x to be the z-value such that an area of . In Confidence interval problems, lies to its right Now, if an area of lies beyond in the right tail of the standard normal (z) distribution, then an area of lies to the left of in the left tail because of the symmetry of the distribution. The remaining area, , is equal to the confidence coefficient - that is, the probability that falls within standard deviation of is . 7. Testing a claim about a mean: Large samples Large-sample test of hypothesis about a population mean ONE -TAILED TEST TWO -TAILED TEST H0: = 0 H0: = 0 Ha: > 0 (or Ha: < 0) Ha: 0 Test statistic: Rejection region: z > z (or z < - z) Rejection region: z < -z /2 (or z > z /2) where z is the z-value such that P(z > z) = ; and z/2 is the z-value such that P(z > z/2) = /2. [Note: 0 is our symbol for the particular numerical value specified for in the null hypothesis.] Assumption: The sample size must be sufficiently large (say, n 30) so that the sampling distribution of is approximately normal and that s provides a good approximately to . Example: The mean time spent on studies of all students at a university last year was 40 hours per week. This year, a random sample of 35 students at the university was drawn. The following summary statistics were computed: Test the hypothesis that , the population mean time spent on studies per week is equal to 40 hours against the alternative that is larger than 40 hours. Use a significance level of = .05. Solution We formulate the hypotheses as: H0: = 40 Ha: > 40 Note that the sample size n = 35 is sufficiently large so that the sampling distribution of is approximately normal and that s provides a good approximation to . Since the required assumption is satisfied, we may proceed with a large-sample test of hypothesis about . Using a significance level of = .05, we will reject the null hypothesis for this one-tailed test if z > z /2 = z.05, i.e., if z > 1.645. This rejection region is shown in Figure. Computing the value of the test statistic, we obtain Since this value does not fall within the rejection region (Figure), we do not reject H0. We say that there is insufficient evidence (at = .05) to conclude that the mean time spent on studies per week of all students at the university this year is greater than 40 hours. We would need to take a larger sample before we could detect whether > 40, if in fact this were the case. 8. P-value method of testing hypothesis A P-Value or probability value is the probability of getting a value of the sample test statistic that is at least as extreme as the one found from the sample data, assuming the null hypothesis is true. In testing using the p-value you follow the same steps, but after you calculate the test statistic, you find a p-value. You may find a p-value from the table, but it is very inaccurate. Once you have a p-value - the guideline is: Reject the null hypothesis if the p-value is less than or equal to the significance level. Fail to reject if p-value is greater than the significance level. Example 2: Because of the expense involved, car crash tests often involve small samples. When 5 BMW cars are crashed under standard conditions the repair costs are shown in the accompanying table. Use a 0.05 significance level to test the claim that the mean for all BMW cars is less than $1000 $797 $571 $904 $1147 $418 Since the p value is not less than the significance level - we fail to reject the null hypothesis. The final conclusion is there is not sufficient sample evidence to support the claim that the average cost is less than $1000 9. Testing a claim about a proportion Large-sample test of hypothesis about a population proportion ONE -TAILED TEST TWO -TAILED TEST H0: p = p0 H0: p = p0 Ha: p > p0 (or Ha: p < p0) Ha: p p0 Test statistic: Rejection region: z > z (or z < - z) where q0 = 1 – p0 Assumption: The interval Rejection region: z < -z/2 (or z > z/2) where q0 = 1 – p0 does not contain 0 and 1. Example: Suppose it is claimed that in a very large batch of components, about 10% of items contain some form of defect. It is proposed to check whether this proportion has increased, and this will be done by drawing randomly a sample of 150 components. In the sample, 20 are defectives. Does this evidence indicate that the true proportion of defective components is significantly larger than 10%? Test at significance level = .0 5. Solution We wish to perform a large-sample test about a population proportion, p: H0: p = .10 (i.e., no change in proportion of defectives) Ha: p > .10 (i.e., proportion of defectives has increased) where p represents the true proportion of defects. At significance level = .05, the rejection region for this one-tailed test consists of all values of z for which z > z.05 = 1.645 The test statistic requires the calculation of the sample proportion, , of defects: Noting that q0 = 1 – p0 = 1 - .10 = .90, we obtain the following value of the test statistic: This value of z lies out of the rejection region; so we would conclude that the proportion defective in the sample is not significant. We have no evidence to reject the null hypothesis that the proportion defective is .01 at the 5% level of significance. The probability of our having made a Type II error (accepting H0 when, in fact, it is not true) is = .05. [Note that the interval does not contain 0 or 1. Thus, the sample size is large enough to guarantee that validity of the hypothesis test.]