Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
1 TEST OF HYPOTHESIS large samples In many circumstances, to arrive at decisions about the population on the basis of sample information we make assumptions about the population parameters involved. Such an assumption is called a statistical hypothesis which many or many not be true. The procedure which enables us to decide on the basis of sample results whether a hypothesis is true or not, is called Test of Hypothesis or Test of Significance. Procedure for testing a hypothesis. Test of Hypothesis involves he following steps: 1. Statement of hypothesis: There are two types of hypothesis (a)Null hypothesis (b)Alternative hypothesis. Null Hypothesis: for applying he tests of significance, we first set up a hypothesis – a definite statement about the population parameter. Such a hypothesis is usually a hypothesis of no-difference is called a Null Hypothesis. It is in the form H 0 : 0 . A null hypothesis is the hypothesis which asserts that there is no significance difference between the statistic and the population parameter and whatever observed differences is there, is merely due to fluctuations in sampling from the same population. Alternative Hypothesis: Any hypothesis which contradicts the Null Hypothesis is called an Alternative Hypothesis. Usually it is denoted by H1 . The alternative hypothesis would be (a) H1 : 0 (b) H1 : 0 (c) H1 : 0 . The alternative hypothesis (a) is called a two-tailed alternative and the (b) is called the right tailed alternative (c) is called the left tailed alternative. 2. Specification of the level of significance: The level of significance is denoted by and it is the confidence with which we rejects or accepts the Null hypothesis. It is usually considered at 5% level. t E t 3. Test statistic: computer the test statistic Z under the null hypothesis. S .E t 4. Conclusion: we compare the computed value of test statistic Z with the critical value Z at the given level o significance. if Z Z , i.e. if the absolute value of the calculated value of Z is less than the critical value Z , we conclude that it is not significant. We accept the null hypothesis. Otherwise we reject the null hypothesis. 2 CRITICAL VALUES OF Z Level of significance Critical values for two-tailed test Critical values for right-tailed test Critical values for left-tailed test 1% 5% 10% Z =2.58 Z =1.96 Z =1.645 Z =2.33 Z =1.645 Z =1.28 Z =-2.33 Z =-1.645 Z =-1.28 Test of significance for large samples: suppose we wish to test the hypothesis that the probability of success in such a trial is P. assuming it to be true, the mean and the standard deviation o the sampling distribution of number of successes are np and npq respectively. If x be the observed number of successes in the sample and “Z” is the standard normal variate x then Z Test of significance of a single mean-Large samples: i. The null hypothesis: H0 : x . I.e. there is no significance difference between the sample mean and population mean or the sample has been drawn from the parent population. ii. The Alternative Hypothesis: H1 : x 0 or H1 : x 0 or H1 : x 0 . Since n is large, the sampling distribution of x is approximately iii. iv. normal. Level of Significance: set the level of significance . Case1 : when the standard deviation of population is known. In this case, standard Error of Mean , S .E.( x) n , where n sample size, standard deviation o the population. The test statistic is given by z x where is the population mean. n Case2: when the standard deviation of population is not known. In this case, we take ‘ s ’, the standard deviation o sample to compute the standard error of means it is given by S .E ( x) s x . Hence the test statistic is given by z . s n n v. Find the critical value z of z at the level of significance from the normal table vi. (a) if z z , we accept the null hypothesis otherwise we reject the null hypothesis. 3 PROBLEMS: 1. According to the norms established for a mechanical aptitude test, persons who are 18 years old have an average height of 73.2 with a standard deviation of 8.6. if 4 randomly selected persons of that age averaged 76.7, test the hypothesis that 73.2 against the 2. 3. 4. 5. alternative hypothesis 73.2 at the 0.01 level o significance. A sample of 64 students have a mean weight of 70kgs. Can this be regarded as a sample from a population with mean weight 56kgs and standard deviation 25kgs. A sample of 900 members has a mean of 3.4cms and S.D. 2.61 cms. Is this sample has been taken from a large population of mean 3.25cm with S.D. 2.61 cms. If the population is normal and its mean is unknown find the 95% confidence limits of true mean. A sample of 400 items is taken from a population whose standard deviation is 10. The mean of the sample is 40. Test whether the sample has come from a population with mean 38. Also calculate 95% confidence interval for the population. An ambulance service claims that it takes on the average of less than 10 minutes to reach its destination in emergency calls. A sample of 36 calls has a mean of 11 minutes and the variance of 16 minutes. Test the claim at 0.05 level of significance. TEST FOR EQUALITY OF TWO MENAS –LARGE SAMPLES: (Test of significance for difference of means of two large samples) Let x1 , x2 be the sample means of two independent large random samples sizes n1 , n2 drawn from two populations having means 1 , 2 and the standard deviations 1 , 2 . To test whether the two population means are equal, let the null hypothesis is H 0 : 1 2 and the alternative hypothesis is H1 : 1 2 . S.E. of x1 x2 12 n1 22 n2 where 1 , 2 are the standard deviations of two populations. To test whether there is any significant difference between x1 , x2 we have to use the x1 x2 following test statistic. z 2 2 . 1 2 n1 n2 If the samples have been drawn from a population with common S.D. then x1 x2 z then 2 2 n1 n2 2 If is not known then n1s12 n2 s22 n1 n2 1 2 22 , 4 Problems: 1. A research investigator is interested in studying whether there is a significant difference in the salaries of MBA grades in two metropolitan cities. A random sample of size 100 from Mumbai yields on average income of 20150/- another random sample of 60 from Chennai results in an average income 20250/-. If the variances of the both populations are given as 12 40000, 22 32400 respectively. 2. The mean life time of a sample of 10 electric bulbs was found to be 1456 hours with S.D. of 423 hours. A second sample of 17 bulbs chosen from a different batch showed mean life of 1280 hours with S.D. of 398 hours. Is there a significant difference between the means of two batches. 3. A company claims that its bulbs are superior to those of its main competitor. If a study showed to a sample o 40 of its bulbs have a mean life time of 647 hours of continuous use with S.D. 27 hours. While a sample of 40 bulbs made by its main competitor had a mean life time of 638 hours of continuous use with S.D. of 31 hours. Test the significance between the difference of two means at 5% level. 4. The nicotine in milligrams of two samples of tobacco were found to be as follows . find the standard error and confidential limits for the difference between the means at 5% level. Sample-A 24 27 26 23 25 Sample-B 29 30 30 31 24 36 TEST OF SIGNIFICANCE FOR SINGLE PROPORTION-LARGE SAMPLES: Suppose a large sample of size n has a sample proportion p of members possessing a certain attribute. To test the hypothesis that the proportion P has a specified value P0 . pP The test statistic z is approximately normally distributed . where p is the sample PQ n proportion, P is the population proportion. Q 1 P . pq ; q 1 p n (b) Confidence interval for proportion P for large sample at level of significance is PQ PQ p z 2 . P p z 2 . ; where Q 1 P . n n Problems: 1. A manufacturer claimed that at least 95% of the equipment which he supplied to a factory conformed to specifications. An examination of a sample of 200 pieces f equipment revealed that 18 were faulty. Test his claim at 5% level of significance. 2. In a big city 325 men out of 600 men were found to be smokers. Does this information support the conclusion that the majority of men in this city are smokers? 3. A die was thrown 9000 times and of these 3220 yielded a 3 or 4. Is this consistent with the hypothesis that the die was unbiased? 4. Among 900 people in a state 90 are found to be chapatti eaters. Construct 99% confidence limits for the true proportion. 5. 20 people were attacked by a disease and only 18 survived. Will you reject the hypothesis that the survival rate if attacked by this diseased is 85% in favour of the hypothesis that is more at 5% level. Note : (a) limits for population proportion P are given by p 3 5 TEST OF SIGNIFICANCE FOR TWO PROPORTIONS-LARGE SAMPLES: Let p1 , p2 be the proportion in two large random samples of sizes n1 , n2 drawn from two populations having proportions p1 , p2 . To test whether the two population proportions are equal, The Null hypothesis H 0 : P1 P2 , The alternative hypothesis H1 : P1 P2 . Assume that the null hypothesis is true, the test statistic is defined as p1 p2 n p n p x x z ; p 1 1 2 2 1 2 and q 1 p is approximately normally n1 n2 n1 n2 1 1 pq n1 n2 distributed with mean 0 and standard deviation 1. Problems: 1. Random samples of 400 men and 600 women were asked whether they would like to have a flyover near their residence. 200 men and 325 women were in favour of the proposal. Test the hypothesis that proportions of men and women in favour of the proposal are same at 5% level. 2. On the basis of their total scores 200 candidates of a civil service examination are divided in to two groups, the upper 30% and the remaining 70%. Consider the first question of the examination. Among the first group, 40 had the correct answer, whereas among the second group, 80 had the correct answer. On the basis of these results, can one conclude that the first question is not good at discriminating ability of the type being examined here? 3. In two large populations, there are 30% and 25% respectively of fair haired people. Is this difference likely to be hidden in samples of 1200 and 900 respectively from the two populations? 4. In a random sample of 1000 persons from town A, 400 are found to be consumers of wheat. In a sample of 800 from town B, 400 are found to be consumers of wheat. Do these data reveal a significant difference between town A and town B, so far as the proportion of wheat consumers is concerned? 5. In a city A, 20% of a random sample of 900 school boys has a certain slight physical defect. In another city B, 18.5% of a random sample of 1600 school boys has the same defect. Is the difference between the proportions significant at 0.05 level of significance? 6 TEST OF HYPOTHESIS SMALL samples Degree of Freedom: it is a number which indicated how many of the values of a variable may be independently chosen. In general the number of degrees of freedom is equal to the total number of observations less the number of independent constraints imposed on the observations. t-Distribution (OR) Students t-distribution: it is used for testing of hypothesis when the sample size is small and population S.D. is not known. If x1 , x2 , x3 , x4 ,....., xn be any random sample of size n drawn from a normal population with mean and variance 2 , then the test statistic ‘t’ is defined by t mean and S 2 1 n xi x n 1 i 1 2 x , where x = sample S n is an unbiased estimate of 2 . The test statistic t x is a S n random variable having the t-distribution with v n 1 degrees of freedom. Students ‘t’ Test: Let x =sample mean, n =sample size, =standard deviation of the sample, =mean of the population supposed to be normal. Then the students ‘t’ is defined by t x . s n 1 1 n If s is the sample variance, s xi x n i 1 2 2 2 Note: if the standard deviation of the sample is given directly, then the test statistic is given by t x . S .D n 1 If t0.05 is the table value of t for ( n 1) degrees of freedom at 5% level of significance, then 95% confidence limits for are given by x t0.05 . are given by x t0.01. S , similarly 99% confidence limits for n S n Problems: 1. A sample of 26 bulbs gives a mean life of 990 hours with a S.D. of 20 hours. The manufacturer claims that the mean life of bulbs is 1000 hours. Is the sample not u to the standard? 7 2. A random sample of 6 steel beams has a mean compressive strength of 58,392 psi with a standard deviation of 648 psi. use this information and the level of significance 0.05 to test whether the true average compressive strength of the steel from which this sample came is 58,000 psi. 3. A random sample of 10 boys had the following I.Q’s:70,120,110,101,88,83,95,98,107 and 100. (a) do these data support the assumption of a population mean I.Q of 100 (b) find a reasonable range in which most of the mean I.Q values of samples of 10 boys lie. 4. A random sample from a company’s very extensive files show that the orders for a certain kind of a machinery were filled, respectively in 10,12,m19,14,15,18,11 and 13 days. Use the level of significance 0.01 to test the claim that on the average such orders are filled in 10.5 days. Choose the alternative hypothesis so that rejection of null hypothesis 10.5 days implies that it takes longer than indicated. Students ‘t’ Test for difference of means: let x, y be the means of two independent samples of sizes n1 , n2 respectively drawn from two normal populations having means 1 , 2 . To test whether the two populations means are equal, let the null hypothesis is H 0 : 1 2 , against the alternative hypothesis H1 : 1 2 . If 1 2 , then an unbiased estimate S 2 of the common variance 2 is given by S2 n1s12 n2 s2 2 , where s12 , s2 2 are the two sample variances. n1 n2 2 The test statistic is given by t freedom. Here x x y follows t-distribution with n1 n2 2 degrees of 1 1 S n1 n2 n1s12 n2 s2 2 . S n1 n2 2 2 The 95% confidence limits for the difference of two population means are x y t .S 1 1 , where 0.025 n1 n2 The 99% confidence limits for the difference of two population means are x y t .S 2 2 1 n1 1 n2 1 xi x yi y or x , y yi and S 2 i n1 n2 2 n1 i 1 n2 i 1 1 1 , where 0.0005 n1 n2 8 Problems: 1. Two horses A and B were tested according to the time in seconds to run a particular track with the following results. Test whether two horses have the same running capacity. Horse-A 28 30 32 33 33 29 34 Horse-B 29 30 30 24 27 29 2. To examine the hypothesis that the husbands are more intelligent than the wives, an investigator took a sample of 10 couples and administered them a test which measures the I.Q. the results are as follows. Husbands 117 105 97 105 123 109 86 78 103 107 Wives 106 98 87 104 116 95 90 69 108 85 3. Measuring the specimens of nylon yarn, taken from two machines, it was found that 8 specimens from first machine had a mean denier of 9.67 with a standard deviation of 1.81 while 10 specimens from second machine had a mean denier of 7.43 with a standard deviation of 1.48. assuming that the proportions are normal, test the hypothesis H 0 : 1 2 1.5 against H1 : 1 2 1.5 at 0.05 level of significance. 4. Random samples of specimens of coal from two mines A and B are drawn and their heat producing capacity were measures yielding the following results. Mine-A 8350 8070 8340 8130 8260 Mine-B 7900 8140 7920 7840 7890 7950 Is there is significant difference between the means of these two samples at 0.01 level of significance. PAIRED –SAMPLE t-Test: suppose a business concern is interested to know whether a particular media of promoting sales of a product is really effective or not. In this case we have to test whether the average sales before and after the sales promotion are equal. If x1 , y1 , x2 , y2 ,..... xn , yn be the pairs of sales data before and after the sales promotion in a business concern, we apply paired t-test to examine the significance of the difference of the two situations. Let di xi yi or yi xi for i 1, 2,..., n Then the Null hypothesis H0 : 1 2 i.e 0 , there is no significant difference between the means in two situations. Then the alternative hypothesis is H1 : 1 2 . Assuming the null hypothesis, the test statistic is defined by 2 d 1 1 n t ; d di ; S 2 d d . i n n 1 i 1 S n The above statistic follows student’s t-distribution with n 1 degrees of freedom. Problems: 1. Scores obtained in a shooting competition by 10 soldiers before and after intensive training are given below. Before 67 24 57 55 63 54 56 68 33 43 After 70 38 58 58 56 67 68 75 42 38 Test whether the intensive training is useful at 0.05 level of significance. 9 SNEDECOR’S F-TEST OF SIGNIFICANCE: Let two independent random samples of sizes n1 , n2 be drawn from two normal populations. To test the hypothesis that the two population variances 12 , 2 2 are equal, Let the null hypothesis be H 0 : 12 2 2 Then the alternative hypothesis is H1 : 12 2 2 The estimates of 12 , 2 2 n s2 are given by S12 1 1 n1 1 x x i n1 1 2 n s2 and S22 2 2 n2 1 y y 2 i n2 1 . Where s12 , s22 are the variances of the two samples. S12 S22 Assuming that H 0 is true, the test statistic F 2 or F 2 according as S12 S 22 or S 22 S12 S2 S1 follows F-distribution with n1 1, n2 1 degrees of freedom. Problems: 1. The measurements of the output of two units have given the following results. Assuming that both samples have been obtained from the normal populations at 10% significant level, test the two populations have the same variance. Unit-A 14.1 10.1 14.7 13.7 14.0 Unit-B 14.0 14.5 13.7 12.7 14.1 2. The following samples are measurements of the heat producing capacity of specimens of coal from two mines. Mine-1 8260 8130 8350 8070 8340 Mine-2 7950 7890 7900 8140 7920 7840 Use the 0.02 level of significance to test whether it is reasonable to assume that the variances of the two populations sampled are equal. CHI-SQUARE TEST:( 2 test ) If a set of events A1 , A2 , A3 ,....., An are observed to occur with frequencies O1 , O2 , O3 ,....., On respectively and according to probability rules A1 , A2 , A3 ,....., An are expected to occur with frequencies E1 , E2 , E3 ,....., En respectively with O1 , O2 , O3 ,....., On are called observed frequencies and E1 , E2 , E3 ,....., En are called expected frequencies. Then 2 is defined as n Oi Ei 2 with n 1 degrees of freedom. Ei This test is used whether differences between observed and expected frequencies are significant. Note: if the data is given in a series of n numbers then degrees of freedom is n 1 . 2 i 1 Incase of Binomial distribution degrees of freedom = n 1 . Incase of Poisson distribution degrees of freedom = n 2 Incase of Normal distribution degrees of freedom = n 3 Problems: 1. A pair of dice are thrown 360 times and the frequency of each sum is indicated below. Sum 2 3 4 5 6 7 8 9 10 11 12 frequency 8 24 35 37 44 65 51 42 26 14 14 Would you say that the dice are fair on the basis of the chi-square test at 0.05 level of significance? 10 2. A sample analysis of examination results of 500 students was made. It was found that 220 students had failed, 170 had secured a third class, 90 were placed in second class, and 20 got a first class. Do these figures commensurate with the general examination result which is in the ration of 4:3: 2:1 for the various categories respectively? CHI-SQUARE TEST:( 2 test ) FOR INDEPENDENCE OF ATTRIBUTES: Literally an attribute means a quality or a characteristic. Examples of attributes are drinking, smoking, blindness, honesty ,beauty etc. RowTotal ColumnTotal In this case the expected frequencies are calculated for any cell = Grand Total n The test statistic 2 i 1 Oi Ei Ei 2 approximately follows chi-square distribution with degrees of freedom no.of rows 1 no.of columns 1 Problems: 1. Four methods are under development for making discs of a super conducting material. Fifty discs are made by each method and they are checked for super conductivity when cooled with liquid. I-method II-method III-method IV-method Super conductors 31 42 22 25 Failures 19 8 28 25 Test the significant difference between the proportions of super conductors at 0.05. 2. From the following data , find whether there is any significant liking in the habit of taking soft drinks among the categories of employees. Soft drinks Clerks Teachers Officers Pepsi 10 25 65 Thumsup 15 30 65 Fanta 50 60 30 2 CHI-SQUARE TEST:( test ) FOR POPULATION VARIANCES: Suppose that a random sample xi i 1, 2,...., n is drawn from a normal population with mean and variance 2 . To test the hypothesis that the population variance 2 has a specified value 0 , let the null hypothesis H 0 : 2 0 2 . The test statistic is given by 2 s 2 x x i x x i 02 2 ns 2 02 , where 2 and ns 2 n 1 S 2 n Problem: 1. A firm manufacturing rivets wants to limit variations in their length as much as possible. The lengths (in cms) of 10 rivets manufactured by a new process are 2.15,1.99, 2.05, 2.12, 2.17, 2.01,1.98, 2.03, 2.25,1.93 . Examine whether the new process can be considered superior to the old if the old population has standard deviation of 0.145 cm?