Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Sufficient statistic wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY INFERENCE ABOUT A POPULATION VARIANCE Variance can be used to describe a number of different situations e.g. In Quality control, engineers must ensure that products coming out of a production line meets specifications such as size, weight, volume etc. In finance, the variance of the returns on a portfolio of investments is a measure of the uncertainty and risk inherent in a portfolio. The sample variance is an unbiased, consistent estimator of the population variance. Chi-square sampling distribution In repeated sampling from a normal population whose variance is 2 , the variable chi-square distributed with (n-1) degrees of freedom. The variable n 1s 2 2 n 1s 2 2 is is called the chi- square statistic and is denoted by 2 . The 2 variable can equal any value between 0 and x. Chi-square notation The value of 2 such that the area to its right under the chi-square curve is equal to x and is denoted by 2 . The value 12 is the point such that the area to its right is 1 . Hence , the area to its left is . Testing the population variance Test statistic for 2 The test statistic used to test hypothesis about 2 is 2 n 1s 2 2 which is chi-square distributed with (n-1) degrees of freedom, provided the population random variable is normally distributed. Estimating the population variance The confidence interval estimator of 2 is n 1s 2 n 1s 2 LCL UCL 2 2 2 1 2 Examples 1. A manufacturer of a bottle-filling machine claims that the standard deviation of the fills from his machine is less than 2 cc. In a random sample of 10 fills, the sample standard deviation was 1.19 cc. Is this sufficient evidence at the 5% level of significance to support the manufacturer’s claim? (Assume a normal population). 2. A company manufactures steel shafts for use in engines. One method of judging inconsistencies in the production process is to determine the variance of the lengths of the shafts. A random sample of 10 shafts produced the following measurements of their lengths in centimeters 20.5, 19.8, 21.1, 20.2, 18.9, 19.6, 20.7, 20.1, 19.8, 19.0 1 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY Find a 90% confidence interval estimate for the population variance 2 assuming that the lengths of the steel shafts are normally distributed. 3. One important factor in inventory control is the variance of the daily demand for a product. Management believes that demand is normally distributed with the variance equal to 250. In an experiment to test this belief about 2 , the daily demand was recorded for 25 days. The data has been summarized as follows: x 50.6 S 2 500 Do the data provide sufficient evidence to show that management’s belief about the variance is untrue? (Use 0.01). 4. Some traffic experts claim that the variability of automobile speeds is a critical factor in determining how many accidents are likely to occur on a highway. The greater the variability, the more the accidents. Suppose a random sample of 101 cars reveals that the mean and variance of their speeds are 57.3 Km/h and 88.7 (Km/h) 2 respectively. i. Can we conclude at the 5% significance level that the variance of all cars speeds exceeds 50 (Km/h) 2? ii. Estimate the variance of all cars speeds with 99% confidence Sampling distribution of the sample proportion Sampling distribution of p̂ The sample proportion p̂ is a approximately normally distributed, with mean p and standard deviation pq , provided that n is large ( np 5 and nq 5 ). n Since p̂ is approximately normal, it follows that the standardized variable Z pˆ p pq n is approximately standard normally distributed. Testing the population proportion The null and alternate hypotheses of tests of proportions are set up in the same way as the pˆ p hypothesis of tests about mean and variance. The test statistic for p is Z pq n Example: An inventor has developed a system that allows visitors to museums, zoos and other attractions to get information at the touch of a digital code. For example, zoo patrons can listen to an announcement (recorded on a microchip) about each animal they see. It is anticipated that the device would rent for $3.00 each. The installation cost for the complete system is expected to be about $400,000. The ABC zoo is interested in having the system installed, but the management is uncertain about whether to take the risk. A financial analysis of the problem indicates that if more than 10% of the zoo visitors rent the system, the zoo will make a profit. To help make the 2 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY decision, a random sample of 400 zoo visitors is given details of the systems capabilities and cost. If 48 people say that they would rent the device, can the management of the zoo conclude at the 5% significance level that the investment would result in a profit? Confidence interval estimator of p is pˆ Z pˆ qˆ n 2 Example 1. A factory produces a component that is used in manufacturing computers. Each component is tested prior to shipment to determine whether or not it is defective. In a random sample of 250 units, 20 were found to be defective. Estimate with 99% confidence the true proportion of defective components produced by the factory. (0.036, 0.124) 2. In a random sample of 100 units from an assembly line, 22 were defective. (a) Does this provide sufficient evidence at the 10% significance level to allow us to conclude that the defective rate among all units exceeds 10%? (b) Find the p-value of the above test. (c) Find a 99% confidence interval estimate of the defective rate. 3. A manufacturer of computer chips claims that more than 905 of his products conform to specifications. In a random sample of 1,000 chips drawn from a large production run, 75 were defective. Do the data provide sufficient evidence at the 1% level of significance to enable us to conclude that the manufacturer’s claim is true? What is the P-value of the test? INFERENCE ABOUT THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE KNOWN 2 2 Sampling distribution of x1 x 2 when 1 and 2 are known. x1 x 2 is normally distributed if the populations that have been sampled are normal. If the populations are not normal, x1 x 2 is approximately normal if the samples are large. The expected value of x1 x 2 is E ( x1 x 2 ) = 1 2 x x The standard deviation of x1 x 2 is 1 2 12 n1 22 n2 It then follows that ( x x 2 ) ( 1 2 ) Z 1 12 n1 22 n2 The test statistic for 1 2 when 1 and 2 are known is 2 Z 2 ( x1 x2 ) ( 1 2 ) 12 n1 22 n2 3 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY Confidence interval estimator of 1 2 when 1 and 2 are known is 2 1 2 ( x1 x 2 ) Z 12 n1 2 2 22 n2 Example The selection of a new store location depends on many factors, one of which is the level of household income in areas around the proposed site. A large departmental store chain is trying to decide whether to build a new store in Nakuru or in the nearby city of Nairobi. Building costs are lower in Nairobi and the company decides it will build there unless the average household income is higher in Nakuru than in Nairobi. In a survey of 100 residences in each of the cities, the mean household was Sh. 29,980 in Nakuru and Sh. 28,650 in Nairobi. From other sources, it is known that the population standard deviations of households’ incomes are Sh. 4,740 in Nakuru and Sh. 5,365 in Nairobi. (a) At the 5% significance level, can it be concluded that the mean household income in Nakuru exceeds that of Nairobi? (b) Estimate with 90% confidence level, the difference in means between the mean household income in Nakuru and that of Nairobi? INFERENCE ABOUT THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNKNOWN 2 2 The test statistic for 1 2 when 1 and 2 are unknown and n1 30 and n2 30 is Z ( x1 x2 ) ( 1 2 ) 2 2 S1 S 2 n1 n2 Confidence interval estimator of 1 2 when 1 and 2 are unknown and n1 30 and n2 30 is 2 1 2 ( x1 x 2 ) Z 2 2 2 2 S1 S 2 n1 n2 The test statistic for 1 2 when 1 and 2 are unknown and n1 30 and n2 30 is 2 t ( x1 x 2 ) ( 1 2 ) 1 2 1 S P n1 n2 2 where S P 2 n1 1S1 2 n2 1S 2 2 n1 n2 2 The test statistic is student t distributed with n1 n2 2 degrees of freedom, provided that the following conditions are satisfied: 4 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY The two population random variables ( x1 and x 2 ) are normally distributed The two population variances are equal i.e. 1 2 2 2 2 The quantity S P is called the pooled variance estimate. Confidence interval estimator of 1 2 when 1 and 2 are unknown and n1 30 and n2 30 is 2 1 2 ( x1 x2 ) t 2 , n1 n2 2 2 1 2 1 S P n1 n2 Examples 1. Despite some controversy, scientists generally agree that high fibre cereals reduce various forms of cancer. However, one scientists claims that people who eat high fibre cereal for breakfast will consume on average fewer calories for lunch than people who do not eat highfibre cereal for break fast. If this is true, high-fibre cereal manufacturers will be able to claim another advantage of eating their products- potential weight reduction for dieters. To test the claim, 200 people were randomly sampled and asked what they regularly eat for breakfast and lunch. Each person was identified as either a consumer or a non-consumer of high fibre cereal, and the number of calories consumed at lunch was measured and recorded. These data are summarized below: Calories consumed at lunch Consumer of high Non -Consumer of high fibre cereal fibre cereal n1 41 n2 159 x1 603 x2 639 S1 110 S 2 141 (a) Is there sufficient evidence at the 5% significance level to support the scientist’s claim? (b) Estimate with 95% confidence the difference in mean consumption of calories at lunch between those who regularly eat and those who do not eat high fibre cereals for breakfast. 2. The manager of a large production facility believes that worker productivity is a function of among other things the design of the job, which refers to the sequence of worker movements. Two designs are being considered for the production of a new product. To help decide which should be used, an experiment was performed. Six randomly selected workers assembled the product using design A and another eight workers assembled the product utilizing design B. the assembly times are normally distributed as shown below: Design A: 8.3, 5.3, 6.5, 5.1, 9.7, and 10.8 Design B: 9.5, 8.3, 7.5, 10.9, 11.3, 9.3, 8.8, and 8.0 5 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY (a) Can the manager conclude at the 5% significance level that the assembly times differ for the two designs? (b) Estimate with 99% confidence, the difference in mean assembly times between design A and design B. 3. High blood pressure is a leading cause of strokes. Medical researchers are constantly seeking ways to treat patients suffering from this condition. A specialist in hypertension claims that regular aerobic exercise can reduce high blood pressure just as successfully as drugs, with none of the adverse side effects. To test the claim, 50 patients who suffer from high blood pressure were chosen to participate in an experiment. For 60 days, half the sample exercised three times per week for one hour; the other half took the standard medication. The percentage reduction in blood pressure was recorded for each individual; the resulting data are shown in the table below Exercise Medication X 1 14.31 X 2 13.28 S1 1.63 S 2 1.82 Can we conclude at the 1% significance level that exercise is at least as effective as medication in reducing hypertension? Inference about the difference between two populations Sampling distribution of pˆ 1 pˆ 2 1. The statistic pˆ 1 pˆ 2 is approximately normally distributed provided the sample sizes are large enough so that n 1 p̂1 , n 1q̂1, n 2 p̂ 2 and n 2 q̂ 2 5. 2. The mean of pˆ 1 pˆ 2 is E ( pˆ 1 pˆ 2 ) p1 p2 pq pq 3. The variance of pˆ 1 pˆ 2 is E ( pˆ 1 pˆ 2 ) 1 1 2 2 n1 n2 Test statistic for pˆ 1 pˆ 2 : Case 1 If the null hypothesis specifies that H 0 : p1 p2 0 the test statistic is ( pˆ pˆ 2 ) ( p1 p 2 ) Z 1 1 1 pˆ qˆ n1 n2 Test statistic for pˆ 1 pˆ 2 : Case 2 If the null hypothesis specifies that H 0 : p1 p2 D where D 0 the test statistic is ( pˆ pˆ 2 ) ( p1 p 2 ) Z 1 pˆ 1 qˆ1 pˆ 2 qˆ 2 n1 n2 6 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY Examples 1. An insurance company is thinking about offering discounts on its life insurance policies to nonsmokers. As part of its analysis, it randomly selects 200 men who are 50 years old and asks them if they smoke at least one pack of cigarettes per day and if they have ever suffered from heart disease. The results indicate that 20 out of 80 smokers and 15 out of 120 nonsmokers suffer from heart disease. Can we conclude at the 5% level of significance that smokers have a higher incidence of heart disease than nonsmokers? 2. The process that is used to produce a complex component used in medical instruments typically results in defective rates in the 40% range. Recently, two innovations have been developed. Innovation one appears to be more promising but is considerably more expensive to purchase and operate that innovation two. After a careful analysis of the costs, management decides that it will adopt innovation one only if the proportion of defective components it produces is at least 8% smaller than that produced by innovation two. In a random sample of 300 units produced by innovation one, 33 are found to be defective. At the 1% significance level, can we conclude that there is sufficient evidence to justify adopting innovation 1? Determine the p-value of the test. Estimating the difference between two population proportions The confidence interval estimator of pˆ 1 pˆ 2 is p1 p2 ( pˆ 1 pˆ 2 ) Z 2 pˆ 1qˆ1 pˆ 2 qˆ 2 n1 n2 Examples 1. In 1998, a survey of 1654 Kenyans found that 37% believed that the “energy crisis” was a hoax. In 2004, of 1814 Kenyans, 42% believed that the energy crisis was a hoax. In order to determine the real size of the change, estimate with 90% confidence the difference between the 1998 and 2004 proportions. 2. In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a sample of 75 low-income voters supported a decrease in sales tax. i. Can we conclude at the 5% level of significance that the proportion of voters favouring a sales tax decrease differs between high and low income voters? ii. What is the p-value of the test? iii. Estimate the differences in proportions, with 99% confidence. 3. In a random sample of 500 television sets from a large production line, there were 80 defective sets. In a random sample of 200 television sets from a second production line, there were 10 defective sets. Do these data provide sufficient evidence that the proportion of defective sets from the first line exceeds the proportion of defective sets from the second by more than 3%? (Use = 0.05) 7