Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Quality Control in Textiles Module 3: Statistical Inferences on Quality Dr. Dipayan Das Assistant Professor Dept. of Textile Technology Indian Institute of Technology Delhi Phone: +91-11-26591402 E-mail: [email protected] Terminologies and Definitions Population & Sample [1] By population we mean the aggregate or totality of objects or individuals regarding which inferences are to be made. The number of objects or individuals present in the population is known as size of the population. By sample we mean a collection consisting of a part of the objects or individuals of a population which is selected as a basis for making inferences about certain population facts. The number of objects or individuals present in the sample is known as size of the sample. The technique of obtaining a sample is called sampling. Parameter and Statistic A parameter is a population fact which depends upon the values of the individuals comprising the population. For example, mean, variance, etc. associated with a population are known as population parameters. They are constants for a given population. A statistic is a sample fact which depends upon the values of the individuals comprising the sample. For example, mean, variance, etc. associated with a sample are known as sample statistics. Of course, many samples of a given size can be formed from a given population, accordingly the statistics will vary from sample to sample. So they are not constants, but are variables. The difference between the value of a population parameter and the value of the corresponding statistic for a particular sample is known as sampling error. Estimation of Population Parameters We can calculate the value of statistics for a sample, but, it is practically impossible to calculate the value of parameters for a population. Therefore, we often estimate the population parameters based on the sample statistics. The two methods of estimation are point estimation interval estimation Sampling Distribution Sampling distribution of a sample statistic is the relative frequency distribution of a large number of determinations of the value of this statistic, each determination being based on a separate sample of the same size and selected independently but by the same sampling technique from the same population. Standard Error The standard error of any statistic is the standard deviation of its sampling distribution of a sample statistic. Bias If the mean of the sampling distribution of a statistic is equal to that of the corresponding population parameter, then the statistic is said to be unbiased. If, on the other hand, the mean of the sampling distribution of a statistic is not equal to that of the corresponding population parameter, then the statistic is said to be biased. Bias may arise from two sources: (1) Technique of sample selection (troublesome, no way to assess its magnitude) (2) Character of the statistic (less troublesome, possible to find out its magnitude and direction and then make allowance accordingly). Sampling Technique Simple Random Sample Assume a sample of a given size is selected from a given population in such a way that all possible samples of this size which could be formed from this population have equal chance of selection, then such a sample is called simple random sample. Simple Random Sampling Scheme Step 1: Assign some identification numbers to each individuals of the population. Step 2: Take out an individual randomly, using “Random Number Table” Step 3: Repeat Step 2 until you obtain the desired number of individuals in a sample. Note: No two individuals assigning the same random number can be taken to form this sample. Random Number Table This is a huge collection of ten-digit numbers such that the ten digits not only would occur with equal frequencies but also are arranged in a random order. 51772 24033 45939 30586 03585 64937 15630 09448 21631 91097 ….. 74640 23491 60173 02133 79353 03355 64759 56301 91157 17480 ….. 42331 83587 52078 75797 81938 95863 51135 57683 77331 29414 ….. 29044 06568 25424 45406 82322 20790 98527 30277 60710 06829 ….. 46621 21960 11645 31041 96799 65304 62586 94623 52290 87843 ….. 62898 21387 55870 86707 85659 55189 41889 85418 16835 28195 ….. 93582 76105 56974 12973 36081 00745 25439 68829 48653 27279 ….. 04186 10863 37428 17169 50884 65253 88036 06652 71590 47152 ….. 19640 97453 93507 88116 14070 11822 24034 41982 16159 35683 ….. 87056 90581 94271 42187 74950 15804 67283 49159 14676 47280 ….. An Assumption It is practically impossible to numerically identifying each individual of a population either because of the large size of the population or because of the inaccessibility or current non-existence of some of the individuals. In such situations, some of the available individuals may be used as a sample. When some are used, these should be selected at random from those available. Although such samples do not comply with the definition of simple random sample, they are often treated as such. That is, they are simply assumed to be random samples of the population involved. Note: This assumption has been widely used in sampling of textile materials. Point Estimation of Population Parameters Estimation of Population Mean Suppose from a population, we draw a number of samples each containing n random variables x1 , x2 , , xn . Then, each sample has a mean x , which is a random variable. The mean (expected) value of this variable is x1 x2 xn 1 x Ex E E x1 x2 xn n n x 1 1 1 E x1 E x2 E xn n , n n n where is the population mean. Thus, sample mean x unbiased estimator of population mean . is an Estimation of Population Mean (Contd.) The variance of the variable x 2 is x x xn n 1 2 sx2 E x x E x =E 1 2 E x1 x2 2 n n n 1 2 2 2 E x1 x1 x2 x1 xn x2 x1 x2 n 2 x2 xn xn x1 xn x2 xn 1 1 1 2 2 2 2 E xi 2 E xi x j 2 n , n i 1,2, ,n n i 1,2, ,n n n 2 2 j 1,2, , n i j xn 2 0 where is the variance of the population. Note that the standard deviation of the variable x is n , which is known as standard error of the variable x . Clearly, larger samples give more precise estimates of the population mean than do smaller samples. Estimation of Population Variance Suppose from a large population of mean and variance 2 , we draw a number of samples each containing n random variables x1 , x2 , , xn . Let x be the mean of these samples and s 2 is the variance of these samples. Clearly, x and s 2 are variables. The expected value of sample variance is 2 1 i n 1 i n 1 i n 2 2 2 E s E xi x E xi x E xi 2 xi x x n i 1 n i 1 n i 1 1 in 1 in 1 in 1 in 1 in 1 in 2 2 2 2 E xi 2 xi x x E xi E 2 xi x E x n i 1 n i 1 n i 1 n i 1 n i 1 n i 1 2 i n i n 1 in 1 1 1 i n 2 2 2 2 1 1 E xi E 2 x xi E x 1 E xi E 2 x n x E x n n n i 1 i 1 n i 1 n n n i 1 1 i n 1 in 2 2 2 2 2 2 E xi E 2 x E x E xi 2 E x E x n i 1 n i 1 1 in 1 i n 1 i n 2 2 1 2 2 n 1 2 2 2 2 2 E xi E x E xi E x n n i 1 n i 1 n i 1 n n n n Estimation of Population Variance (Contd.) 2 Since E s 2 2 , the estimator s is said to be biased estimate of the population variance. n 1 2 Because E s n n 2 s . Further, we can write that E n 1 2 2 n n n xi x xi x n 2 2 2 E s E E E S n 1 n 1 n n 1 i 1 i 1 s2 S 2 2 2 xi x Since E S 2 2 we can say that S 2 n 1 i 1 estimate of the population variance. n 2 is an unbiased n 1 2 n 1 2 2 2 Lt E s Lt Lt , Note: In case when n we get n n n n n 1 then we say that s 2 is an unbiased estimate of 2 . Estimation of Population Standard Deviation Suppose from a large population of mean and variance 2 , we draw a number of samples each containing n random variables x1 , x2 , , xn . Let x be the mean of these samples and s 2 is the variance of these samples. Clearly, x and s 2 are variables. The expected value of sample variance is E s2 n 1 2 n It is possible to derive that the expected value of sample standard c deviation is n n 2 2 c E s where denotes gamma function, n n 1 n n n 1. 2 s s Then, E . Thus, gives an unbiased estimate of . cn cn Estimation of Population Standard Deviation (Contd.) The variance of s is ss2 E s 2 E s 2 n 1 2 2 2 n 1 cn 2 cn 2 n n Hence the standard error of s is 2 n 1 c n n . Estimation of Difference Between Two Population Means Suppose we have two independent populations with mean x and y and variances 2x and 2y , respectively. Let N be the pairs of random samples that are formed from each of the two populations each having n1 variates from the first and n2 from the second population. Let the means of these samples be x1 , x2 , , xn from the first population and y1 , y2 , , yn from the second. Then, consider the difference of the means x1 y1 , x2 y2 , , xn yn . The mean of the difference of the means is x y E x y E x E y x y x y Thus, the difference of the means of two samples is an unbiased estimator of the difference of the means of two populations. Estimation of Difference Between Two Population Means (Contd.) The variance of the difference of the sample means is 2 sx2 y 2 2 E x y E x y E x y x y E x x y y x y 2 2 2 2 E x x y y 2 x x y y E x x E y y 2 E x x E y y 0 Ex x E y y 2 2 0 2 2x y s s n1 n2 2 x 2 y where sx2 and s y2 are the variances of x and y , respectively. Note that the standard deviation of the variable x y is sx y , which is known as standard error of the variable x y. Estimation of Population Proportion Assume a population consists of “good” items and “bad” items. Let the proportion of good items in this population be p. Hence, the proportion of bad items in this population is 1-p. Let us draw n items from this population such that it resembles a series of n independent Bernoulli trials with constant probability p of selection of good items in each trial. Then the probability of selection of x good items in n trials, as given by binomial n x distribution, is nCx p x 1 p , where x 0,1, 2,..., n. Then, the mean (expected) number of good items in a n trials is np and the standard deviation of number of good items in n trials is np 1 p . Suppose we draw n items from this population to form a sample. Assume x be the number of good items in this sample. Hence, the proportion of good items in this sample is p x n . The mean (expected) proportion of good items in this sample is Estimation of Population Proportion (Contd.) 1 x 1 E p E E x np p. n n n np Hence, the sample proportion is an unbiased estimator of population proportion. The variance of sample proportion is given by 2 x x x 1 E p E p E E E E x n n n n p 1 p 2 1 1 2 E x E x 2 np 1 p n n n 2 2 np 1 p Then, the standard deviation of sample proportion is This is known as standard error of sample population. p 1 p . n Interval Estimation of Population Parameters Probability Distribution of Sample Means If the population from which samples are taken is normally distributed with mean and variance 2 then the sample means of size n is also normally distributed with mean and variance 2 n . 1 x 1 f x exp 2 2 2 f x 1 2 1 x exp 2 n n 2 If the population from which samples are taken is not normally distributed, but has mean and variance 2 , then the sample means of size n is normally distributed with mean and variance 2 n when n (large sample). 2 1 x 1 f x exp 2 2 2 1 1 x f x exp 2 n 2 n when n Estimation of Population Mean Assume population distribution is normal, regardless of sample size or take large samples n , regardless of population distribution. The sample means x follows normal distribution with mean and standard deviation n . Let u x , where u is known as standard normal variable. n Then, P u 2 u u 2 1 x P u 2 u 2 1 n P x u 2 x u 2 1 n n x u 2 n and x u 2 n 2 u 2 2 n u 2 n x are called the 1 100% confidence interval of . Popular Confidence Intervals for Population Mean Often, the 95 percent confidence intervals of population mean µ are estimated, they are x 1.96 , x 1.96 n n The 99 percent confidence intervals of population mean µ are x 2.58 , x 2.58 n n Illustration Consider the theoretical population of yarn strength follows normal distribution with mean at 14.56 cN.tex-1 and standard deviation at 1.30 cN.tex-1. Then, for random samples of 450 yarns selected from this population, the probability distribution of sample means follow normal distribution with mean 14.56 cN.tex-1. and standard deviation 0.0613 cN.tex-1 (1.3/21.21 cN.tex-1 = 0.0613 cN.tex-1). This distribution is shown in the next slide. Illustration (Contd.) f x cN-1 .tex µ=14.56 xcN.tex-1 In the long run, 68.26 percent of the mean strength of random samples of 450 yarns selected from this population will involve sampling errors of less than 0.0613 cN.tex-1. Or, the probability of a sample mean being in error by 0.1201 (1.96×0.0613) cN.tex-1 or more is 0.05. Probability Distribution of Sample Means (Contd.) Assume that the population from which samples are taken is normally distributed, but mean and variance 2 are unknown. Then, we consider a statistic T1 as defined below T1 x S n i n where S xi x 2 i 1 n 1 . The PDF of T follows t-distribution n n 2 T1 2 2 f T1 1 n 1 n 1 n 1 2 where n-1 is known as degree of freedom and denotes gamma function n n 1 f T1 Normal distribution n=30 n=10 n=3 T1 Probability Distribution of Sample Means (Contd.) One can see here that as n 30 the t-distribution practically approaches to a normal distribution. f T1 Normal distribution n=30 Small sample (practically): n 30 n=10 Large sample (practically): n 30 n=3 Note: For large sample, one can then find out the confidence interval of population mean based on normal distribution as discussed earlier. T1 Estimation of Population Mean (Contd.) The statistic T1 follows t-distribution with n-1 degree of freedom. P t 2 T1 t 2 1 or, P x t 2 x or, P t 2 t 2 1 S n S S x t 2 1 n n The 1 100% confidence intervals for are S S x t , x t . 2 2 n n The value of t 2 for n-1 degree of freedom can be found from t table. Illustration Consider a population of cotton fibers has mean length 25 mm and standard deviation of length is 0.68 mm. Then, for random samples of 10 fibers selected from this population, the probability distribution of sample means follow t-distribution with mean length 25 mm and standard deviation of length 0.23 mm, and degree of freedom 9. This distribution is shown in the next slide. Illustration f T1 The probability of mean length of 10 fibers selected from the population being in error by 0.52 mm (2.262×0.23) or more is 0.05. x 0.23 T1 Scale 23.85 24.08 24.31 24.54 24.77 25.00 25.23 = 25.46 25.69 25.92 26.15 x Scale (mm) Probability Distribution of Difference Between Two Sample Means y1 , y2 , , yn Let x1 , x2 , , xn and be two independent sample observations from two normal populations with means x , y and variances x , y , respectively. 2 1 Or, let x1 , x2 , , xn and y1 , y2 , , yn be two independent large sample observations from two populations with means x , y and variances x , y , respectively. 2 1 Then the variable U U x y x y sx2 s y2 is a standard normal variable with mean zero and variance one. Estimation of Difference Between Two Populations P U 2 U U 2 1 x y x y P U 2 U 2 1 sx2 s y2 P x y U 2 s 2 x s y2 x y x y U 2 Hence the 100(1-) % confidence intervals for x y U 2 s 2 x s y2 , x y U 2 s s 2 x 2 x x s y2 1 y s y2 are Probability Distribution of Difference Between Two Sample Means (Contd.) Let x1 , x2 , , xn and y1 , y2 , , yn be two independent small sample observations from two populations with means x , y and variances x , y , respectively. Then the variable T2 2 1 T2 x y x y Sx y 1 1 n1 n2 , 2 2 n 1 S n 1 S x 2 y 1 , where Sx2 y n1 1 n2 1 n1 1 n2 1 follows t-distribution with n12x n22y n1 n2 2 degree of freedom. Estimation of Difference Between Two Population Means (Contd.) P t 2 T2 t 2 1 x y x y P t 2 t 2 1 1 1 Sx y n1 n2 1 1 1 1 P x y t 2 S x y x y x y t 2 S x y n1 n2 n1 n2 Hence the 100(1-) % confidence intervals for x y 1 1 1 1 , x y t 2 S x y x y t 2 S x y n1 n2 n1 n2 1 are Probability Distribution of Sample Variances Suppose we draw a number of samples each containing n random variables x1 , x2 , , xn from a population that is normally distributed with mean and variance 2 then the sample means x of size n is also normally distributed with mean and variance 2 n . Then the variable ns 2 1 xi x 2 i 1 i n 2 in x x i 1 i ns 2 n 1 S 2 2 2 2 follows distribution with n-1 degree of freedom. PDF: f e 2 n 1 2 n 2 2 n 2 with n d.f. 0 2 f n=1 n=2 n=3 n=4 n=5 2 Estimation of Population Variance Then P 21 2,n1 2 2,n1 1 2 ns 2 ns 2 or, P 2 2 1 2,n 1 1 2,n 1 2 n 1 S 2 n 1 S or, P 2 2 2 1 2,n 1 1 2,n 1 2 ns 2 2 P 1 2,n 1 2 2,n 1 1 2 2 n 1 S n 1 S The 100(1-)% confidence intervals for 2 are 2 , 2 . 1 2,n1 2,n1 Probability Distribution of Sample Variances (Contd.) When n , the following statistic n 1 S 2n 1 ns 2 2 2n 1 2 2 2n 1 2 2 2 approaches to a standard normal distribution with mean zero and variance one. Estimation of Population Variance (Contd.) P 2 2 1 n 1 S 2 P 2 2 2n 1 2 1 2 P 2 n 1 S 2 2n 1 2 2 2 2 n 1 S 2 2n 1 2 1 2 The 100(1-)% confidence intervals for 2 are 2 n 1 S 2 2n 1 2 , 2 2 n 1 S 2 2n 1 2 2 Probability Distribution of Sample Proportions Take large samples n , then we know that binomial distribution approaches to normal distribution. Then the variable V is a standard normal variable with mean zero and variance one. V p p p 1 p n where p x n , x is the number of successes in the observed sample, n being the sample size. Note: Earlier we have shown that the standard deviation of p’ is But, when p is not known, p’ can be taken as an unbiased p 1 p n . estimator of p, then the standard deviation of p’ can be written as p 1 p n . Estimation of Population Proportion P V 2 V V 2 1 p p P V 2 V 2 1 p 1 p n p 1 p P p V 2 p p V 2 n p 1 p 1 n The 100(1-)% confidence intervals are p 1 p p 1 p p V , p V 2 2 n n Illustration Consider a population consists of “good” garments and “bad” garments. A random sample of 100 garments selected from this population showed 20 garments were bad, hence the proportion of good garments in this sample was p 0.80. Then, for random samples of 100 garments taken from this population, the probability distribution of p follows normal distribution with mean 0.8 and standard deviation 0.04 0.8 0.2 100 . This distribution is shown in the next slide. Illustration In long run 68.26 percent of the means of random samples of 100 garments selected from this population will involve sampling errors of less than 0.04. Or, the probability of a sample mean being in error by 0.0784 (1.96×0.04) or more is 0.05. f p p Testing of Hypothesis Need for Testing Testing of statistical hypothesis is a process for drawing some inference about the value of a population parameter from the information obtained in a sample selected from the population. Types of Test 1. One-tailed test 2. Two-tailed test Illustration Sometimes we may be interested only in the extreme values to one side of the statistic, i.e., the so-called one “tail” of the distribution, as for example, when we are testing the hypothesis that one process is better than the other. Such tests are called one-tailed tests or onesided tests. In such cases, the critical region considers one side of the distribution, with the area equals to the level of significance. Sometimes we may be interested in the extreme values on both sides of the statistic, i.e., the so-called two “tails” of the distribution, as for example, when we are testing the hypothesis that one process is not the same with the other. Such tests are called two-tailed tests or two-sided tests. In such cases, the critical region considers two side of the distribution, with the area of both sides equals to the level of significance. Testing Procedure Step 1 : State the statistical hypothesis. Step 2 : Select the level of significance to be used. Step 3 : Specify the critical region to be used. Step 4 : Find out the value of the test statistic. Step 5 : Take decision. Statement of Hypothesis Suppose we are given a sample from which a certain statistic such as mean is calculated. We assume that this sample is drawn from a population for which the corresponding parameter can tentatively take a specified value. We call this as null hypothesis. This is usually denoted by H . This null hypothesis will be tested for possible rejection under the assumption that the null hypothesis is true. Alternative hypothesis is complementary to null hypothesis. This is usually denoted by H A . For example if H : 0 is rejected then H A : 0 , 0 , 0 . Selection of Level of Significance The level of significance, usually denoted by , is stated in terms of some small probability value such as 0.10 (one in ten) or 0.05 (one in twenty) or 0.01 (one in a hundred) or 0.001 (one in a thousand) which is equal to the probability that the test statistic falling in the critical region, thus indicating falsity of H. Specification of Critical Region A critical region is a portion of the scale of possible values of the statistic so chosen that if the particular obtained value of the statistic falls within it, rejection of the hypothesis is indicated. Test Statistic The phrase “test statistic” is simply used here to refer to the statistic employed in effecting the test of hypothesis. The Decision In this step, we refer the value of the test statistic as obtained in Step 4 to the critical region adopted. If the value falls in this region, reject the hypothesis. Otherwise, retain or accept the hypothesis as a tenable (not disproved) possibility. Illustration: A Problem A fiber purchaser placed an order to a fiber producer for a large quantity of basalt fibers of 1.4 GPa breaking strength. Upon delivery, the fiber purchaser found that the basalt fibers, “on the whole”, were weaker and asked the fiber producer for replacement of basalt fibers. The fiber producer, however, replied that the fibers produced met the specification of the fiber purchaser, hence, no replacement would be done. The matter went to court and a technical advisor was appointed to find out the truth. The advisor conducted a statistical test. Illustration: The Test Step 1 : Null hypothesis H : GPa 1.4 Alternative hypothesis H A : GPa 1.4 where is the population mean breaking strength of basalt fibers as ordered by the fiber purchaser. Step 2 : The level of significance was chosen as 0.01. Step 3: The advisor wanted to know the population standard deviation of strength. So he made a random sample with 65 fibers n 65 and observed the sample standard deviation s of breaking strength was 0.80 GPa. Then, he estimated the population standard deviation ̂ of strength as follows: n 65 ˆ GPa n 1 sGPa 65 1 0.80 0.8062 f x - Illustration: The Test Continued Then, the critical region for mean breaking strength x is found as: xGPa GPa u ˆ GPa n 1.4 2.33 0.10 1.1670 GPa xGPa Step 5: The advisor observed that the sample mean breaking strength x was 1.12 GPa. Step 6 : The advisor referred to the observed value xGPa 1.12 to the critical region he established and noted that it fell in this region. Hence, he rejected the null hypothesis and thus accepted the alternative hypothesis GPa 1.4. Errors Associated with Testing of Hypothesis Let us analyze the following situations: Possibilities True H False H Course of Action Accept Reject (Desired correct action) (Undesired erroneous action) Accept Reject (Undesired erroneous action) (Desired correct action) Type I Error: Rejecting H when it is true. Type II Error: Accepting H when it is false. In situations where Type I error is possible, the level of significance represents the probability of such an error. Higher is the value of level of significance, higher is probability of Type I error. Type I Error 0 means complete elimination of occurrence of Type I error. Of course, it implies that no critical region exists, hence H is retained always. In this case, in fact, there is no need to analyze or even collect any data at all. Obviously, while such a procedure would completely eliminate the possibility of making a Type I error, it does not provide a guarantee against error, for every time that the H stated is false, a Type II error would necessarily occur. Similarly, by letting 1 it would be possible to eliminate entirely the occurrence of Type II error at the cost of committing a Type I error for every true H tested. Thus, the choice of a level of significance represents a compromise effect at controlling the two type of errors that may occur in testing statistical hypothesis. Type II Error We see that for a given choice of , there is always a probability for Type II error. Let us denote this probability by . This depends upon: (1) the value of chosen (2) the location of critical region (3) the variability of the statistic (4) the amount by which the actual population parameter differs from the hypothesized value of it, stated in H. Because in any real situation, the actual value of a population parameter can never be known, the degree of control exercised by a given statistical test on Type II error can never be determined. Illustration: The beta value f x - Let us assume that the actual population mean breaking strength of basalt fibers supplied by the fiber producer was 1.0 GPa. In this case, Type II error will occur xGPa 1.1670. when The probability of this can be found as under P xGPa Critical region GPa xGPa ˆ GPa 1.1670 P u 1.67 0.0475 1.1670 P GPa u n 1.0 0.10 Hence, the probability of Type II error is 0.0475. That is, if in this situation, this test were to be repeated indefinitely, 4.75 percent of the decisions would be of Type II error. Illustration: Effect of alpha on beta Let us take 0.001 . Then the critical region is xGPa GPa u Then, ˆ GPa n 1.4 3.09 0.10 1.091 ˆ GPa 1.091 P u 0.91 0.1814 P xGPa 1.091 P GPa u n 1.0 0.10 In this way, we obtain 0.001 0.1814 0.005 0.0778 0.010 0.0475 0.050 0.0094 0.100 0.0033 As alpha increases, beta decreases. Illustration: Effect of Location of Critical Region on Beta Value Let us suppose that H : GPa 1.4 and HA : GPa 1.4. It means H A : GPa 1.4 or HA : GPa 1.4. Then, the two critical regions are: xGPa GPa u xGPa GPa u ˆ GPa n ˆ GPa n 1.4 2.58 0.10 1.1420 1.4 2.58 0.10 1.6580 Assume GPa 1.0. Then, the value of beta is the probability that xGPa lies in-between +1.1420 and +1.6580. This is the same as the probability of u in-between +1.42 to +6.58, which for all practical purposes, is simply the probability of u>+1.42. Hence 0.0778. Illustration: Effect of Sample Variability on Critical Region on Beta Value Let us suppose that H : GPa 1.4 and HA : GPa 1.4. Assume the estimate of population variance is increased to 0.20 GPa. Then the critical region of xGPa at 0.01 is xGPa GPa u ˆ GPa n 1.4 2.33 0.20 0.9340 Now, if we assume GPa 1.0 P xGPa ˆ GPa 0.9340 P u 0.33 0.6293 0.9340 P GPa u n 1.0 0.20 Illustration: Effect of Difference Between Actual & Hypothesized Values of Population Parameter Let us suppose that ˆ GPa H : GPa 1.4 and HA : GPa 1.4. Choose 0.10. The critical region is n ˆ GPa xGPa GPa u 1.4 2.33 0.10 1.1670 n Let us find out assuming the actual value of GPa is 0.9. ˆ GPa 1.1670 P u 2.67 0.0038 P xGPa 1.1670 P GPa u n 0.9 0.10 Let us now find out assuming the actual value of GPa is 1.30. ˆ GPa 1.1670 P u 1.33 0.9082 P xGPa 1.1670 P GPa u n 1.3 0.10 0.01. Take Power of A Statistical Test Suppose that the actual value of a population parameter differs by some particular amount from the value, H, hypothesized for it such that the rejection of H is the desired correct outcome. The probability when this outcome will be reached is the probability that the test statistic falls in the critical region. Let us refer to this probability as the power (P) of the statistical test. Since represents the probability that the test statistic does not fall in the critical region, then the probability P that the test statistic falls in the critical region is 1 P. Hence, the power of a test is the probability that it will detect falsity in the hypothesis. Power Curve The power curve of a test of a statistical hypothesis, H, is the plot of P-values which correspond to all values that are possible alternatives to H. In other words, power curve may be used to read the probability of rejecting H for any given possible alternative value of . Power Curve: Illustration Let us draw the power curve for the previous example. There exists an infinite collection of –values GPa 1.4 that are possible alternative values to the hypothesized value of 1.4 and accordingly Pvalues exist. Some are shown here. - P- 1 - GPa u- 0.7 4.67 0 1 0.8 3.67 0.0001 0.9999 0.9 2.67 0.0038 0.9962 1.0 1.67 0.0475 0.9525 1.1 0.67 0.2514 0.7486 1.2 -0.33 0.6293 0.3707 1.3 -1.33 0.9082 0.0918 Power Curve: Illustration (Contd.) P- GPa Inadequacy of Statistical Hypothesis Test We have seen that the decision of a statistical hypothesis test is based on whether the null hypothesis is rejected or is not rejected at a specified level of significance (-value). Often, this decision is inadequate because it gives the decision maker no idea about whether the computed value of the test statistic is just barely in the rejection region or whether it is very far into this region. Also, some decision makers might be uncomfortable with the risks implied by a specified level of significance, say =0.05. To avoid these difficulties, the P-value approach has been widely used. u0 P-value Approach [2] u0 The P-value is the probability that the test statistic takes on a value that is at least as extreme as the observed (computed) value of the test statistic when the null hypothesis H is true. Otherwise, it can be said that the P-value is the smallest level of significance that would lead to rejection of the null hypothesis with the given data. How one can calculate the P-value? 2 1 z0 for a two-tailed test: 1 z P-value= for a upper-tailed test: 0 z 0 for a lower-tailed test: H : 0 , H A : 0 H : 0 , H A : 0 H : 0 , H A : 0 z is cumulative distribution function of standard normal variable z. Testing Procedure Step 1 : State the statistical hypothesis. Step 2 :Find out the value of the test statistic. Step 3 : Find out the P-value. Step 4 : Select the level of significance to be used. Step 5 : Take decision. Illustration: The Test We refer to the problem of strength of basalt fiber. Step 1 : Null hypothesis H : GPa 1.4 Alternative hypothesis H A : GPa 1.4 where is the population mean breaking strength of basalt fibers as ordered by the fiber purchaser. Step 2: The test statistic is computed as follows: z 0 xGPa GPa ˆ GPa n 1.12 1.4 1.12 1.4 2.8 0.10 0.8062 65 Step 3: The P-value is computed as follows: 2.80 0.0026 Illustration: The Test (Continued) Step 4 : The level of significance is chosen as P-value = 0.0026. Step 5: The null hypothesis would be rejected at any level of significance 0.0026. For example, the null hypothesis would be rejected if =0.01, but it would not be rejected if =0.0010. Frequently Asked Questions & Answers Frequently Asked Questions & Answers Q1: What is the difference between the parameter and statistic? A1: The parameter, representing a statistical characteristic of the population, is a constant for the given population, but, the statistic, representing a statistical characteristic of the sample, is a variable. Q2: What is the standard error of mean fiber length? A2: The standard deviation of distribution of mean fiber length is the standard error of mean fiber length. Q3: Why it is not practically possible to obtain a simple random sample as defined? A3: It is practically impossible to numerically identify each individual of a population either because of the large size of the population or because of the inaccessibility or current non-existence of some of the individuals, therefore, it is not possible to obtain a simple random sample as defined. Frequently Asked Questions & Answers Q4: Why it is often said that the larger sample can give more precise estimation of population mean? A4: This is said so because as the sample size increases, the standard deviation of sample mean (that is, standard error of sample mean) reduces. Q5: While calculation of variance, sometimes the divisor is found to be n-1, where n is sample size, and sometimes it is found as n? Why is it so? A5: While calculating the sample variance, the divisor should be n, and this is known as a biased estimator of population variance. But, when the divisor is n-1, the resulting expression is known as an unbiased estimator of population variance. Q6: In order to know whether the newly developed process is superior to the existing process, which test – one-tailed test or two-tailed test - is recommended? A6: One-tailed test. Frequently Asked Questions & Answers Q7: In order to know whether the newly developed process is different than the existing process, which test – one-tailed test or two-tailed test - is recommended? A7: Two-tailed test. Q8: Is it so that the probability of type II error can be reduced by the choice of a larger sample size? A8: Yes References 1. Gupta, S. C. and Kapoor, V. K., Fundamentals of Mathematical Statistics, Sultan Chand & Sons, New Delhi, 2002. 2. Montgomery, D. C. and Runger, G. C., Applied Statistics and Probability for Engineers, John Wiley & Sons, Inc., New Delhi, 2003. Sources of Further Reading 1. Leaf, G. A. V., Practical Statistics for the Textile Industry: Part I, The Textile Institute, UK, 1984. 2. Leaf, G. A. V., Practical Statistics for the Textile Industry: Part II, The Textile Institute, UK, 1984. 3. Gupta, S. C. and Kapoor, V. K., Fundamentals of Mathematical Statistics, Sultan Chand & Sons, New Delhi, 2002. 4. Gupta, S. C. and Kapoor, V. K., Fundamentals of Applied Statistics, Sultan Chand & Sons, New Delhi, 2007. 5. Montgomery, D. C., Introduction to Statistical Quality Control, John Wiley & Sons, Inc., Singapore, 2001. 6. Grant, E. L. and Leavenworth, R. S., Statistical Quality Control, Tata McGraw Hill Education Private Limited, New Delhi, 2000. 7. Montgomery, D. C. and Runger, G. C., Applied Statistics and Probability for Engineers, John Wiley & Sons, Inc., New Delhi, 2003.