Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Module Eight:Comparative Study for Inter-laboratory Testing When an inter-laboratory testing is conducted, the analysis of the testing results may include: • Determine the best estimate and its corresponding uncertainty of the variable of interest: xBest U xBest • Make an interval estimation of the variable of interest based on the corresponding distribution: confidence interval: xBest kU x Best • Conduct a comparative study: 1. Comparing with the reference standard. 2. Comparing the effects between two groups, when two samples are tested in dependently: For example, two methods of testing procedures are to be compared. 20 units of similar material will be randomly assigned for testing using either methods, 10 for each method. The purpose is to compare the difference between these two testing methods. 1 3. Comparing the changes of a response before and after (or with/without) a treatment is performed. For example, to test the poison of a chemical compound with and without an additional additive in ten labs. Each compound is divided into two sub-samples. Each lab test the pair of the compound, one with additional additive, he other without. The difference between each pair tested by a lab is due to the additive. Note, in this comparative study, each pair of sub-samples are the same or very similar. This is a paired sample problem. 4. Comparing the effects among several groups, when a treatment has more than two levels. This type of comparative studies are common in interlaboratory testing. For example, one is interested in studying the compressive strength of concrete using five different formula. Ten specimen are produced using each formula. The compressive strengths are tested. This is a one-factor experiment with five factor levels. Our interest is to compare their strength and to determine which formula gives the highest strength. If the only difference of these formula is the dosage of an additive, ranging from 1%, 1.5%, 2%, 2.5% and 3%. Then, in addition to compare the strength among the formula, we can also fit a prediction model to determine the dosage level that results the maximum strength. 2 5. In many experiments, there may be more than one factor. The study is not only understand the effect of each factor, but also to study the interaction effect between two factors. This is a multifactor study. For example, For the compressive strength of concrete testing study, in addition to the five levels of formula, in the process of concrete formation, the temperature is another critical factor. We should consider both formula factor and temperature factor when producing the specimen for strength test. Suppose we would like to test for three levels of temperature. We have 5x3 two factorial design. For each treatment combination, four specimen are produced. We have a total of 3x5x4 = 60 specimen for strength testing. We are interested in studying comparing the strength among different formula, among different temperature, and the strengths among different formula for each temperature level. 6. Another type of study in lab testing is to study the variance components of factors for the purpose of identifying factor levels that will reduce variability of response variable. For example, in a metal alloy casting process, each casting is broken into small bars that are used for other applications. The tensile strength of the alloy is critical to its intended use. There is a specification of the strength. If variation of the strength is excessively large, this means a large amount of bars will not meet the specification limits. An experiment can be designed to identify factors and their level combinations that will produce bars with small variability. This is a variance component problem. 3 In this module, we will discuss the type of comparative studies: 1, 2 and 3. In Module Ten, we will discuss the comparative study four, the one-factor design and analysis. And in Module Eleven we will focus on comparative 5, multifactor designs and analysis. Module Twelve will study the Variance Components problems. 4 Comparative Study One: Comparing testing results with a given reference or a given standard In a lab testing study, one may be be interested in making a comparison of the testing results with a given standard or a reference measurement. The following steps may be applied to plan such a study: 1. Identify the given standard or reference measurement, and make sure the resource that developed the standard meet your purpose. 2. Set up an adequate lab testing environment and testing procedure. 3. The operator of the testing should be adequately trained to reduce unexpected errors. 4. Plan the experimental procedure, determine the number of experimental runs to be conducted. 5. Prepare the needed experimental units, and make sure these units are as homogeneous as possible. 6. Conduct the lab testing and carefully collect the data of interest. It is a good practice to record any special events occurred during the testing. 5 Now, a data set is collected, and we would like to make a comparison with a given reference. Steps for this analysis may include: 1. Carefully check the data for unusual measurements that may be due to systematic error or special causes – Techniques for detecting outliers can be applied here. 2. Compute descriptive summaries and graph a histogram, box plot for identifying outliers and normal probability plot for checking the normality assumption. 3. If there is a serious violation of normality assumption, one may choose to make a data transformation. If there are outliers, one should go back to check the possible special causes, and decide to keep or drop these outliers before the analysis. 4. The comparison is the one-sample test. Here is the procedure to conduct the comparison. 6 One-sample t-test for comparing the testing results with a given reference. Example: The brightness of a certain type of paper is defined in the scale of 1 to 100. A reference of the brightness of the type of paper is at the scale 60. A lab is experimenting a new process for producing the type of paper, and would like to test its brightness to see if the paper meet the required brightness. A random sample of 30 sheets are chosen and tested by a lab. Here is the collected data: 55 42 59 64 59 68 60 52 56 59 55 62 59 57 63 58 52 55 58 61 65 63 52 58 62 54 58 59 64 63 A quick eye check immediately identify a value of 42, which a much smaller than the rest. We first draw a box plot and a normal probability to identify outliers and to check the normality assumption. 7 Boxplot of brightness using the entire data set 40 50 60 Boxplot of brightness_1: The outlier '42' is delected 50 70 60 •The normality test appears data follow normal curve very well. Normality Test for the Brightness - excluding the outlier .999 .99 .95 Probability •Reviewing the records from the lab testing, it is noticed that the paper given ’42’ was due to a special cause of wrong timing in a testing process. It is therefore removed from further analysis. 70 brightness_1 brightness .80 .50 .20 .05 .01 .001 52 57 62 67 brightness_1 Average: 58.9655 StDev: 4.11862 N: 29 Anderson-Darling Normality Test A-Squared: 0.302 P-Value: 0.554 8 The concept and Procedure for performing the one sample t-test When we are conducting a hypothesis test for comparing with a given reference, there are usually two choices; one is the hypothesis we intend to establish in our study, the other is the opposite. In order to make the procedure of testing easier, we define these two hypotheses: H0 and Ha. Ha is the one we intend to establish. For this paper brightness test, our Ha is the actual average brightness of the paper is significantly different from the given reference. H 0 : 0 Typical notation for the hypotheses are: H 0 : 60 For the paper brightness study, we have: H a : 0 H a : 60 Q: When.how do we decide to take H0 or Ha ? As we see, if the average of the sample data is either much larger or much smaller than 60, we will choose Ha; otherwise, we choose H0. 9 Q: But, how far is far enough to make such a conclusion? If the sample average is, say 59.5 or 60.4, then, we would not conclude it is far enough to conclude Ha. Therefore, we will need two critical average brightness, x1 and x2 , so that when the sample average obtained from the sample data is beyond these two values, we will conclude Ha, that is, the brightness is of the paper is significantly different from the reference brightness, 60. Q: How to determine the two critical values? This can be answered by bringing in the distribution of distribution is the distribution of X under H0. a/2.025 a/2.025 x1 Reject H0 60 Accept H0 -t(a/2, n1 X x2 Reject H0 t(a/2, n1 x 0 t s n X . The following Our common experience suggests that the probability of rejecting H0 should be small, so that, only when the sample average is much far away from 60, we will conclude Ha. Therefore, a typical probability for rejecting H0 is 5% or 1%. Standardized form of X is used for making proper comparison, which is the t-distribution. 10 Procedure for conducting one-sample t-test: 1. Set up H0 and Ha 2. Determine the rule for rejecting and accepting H0 regions based on the type of hypothesis rule based on the t-distribution. 3. From the sample data, we compute the t-value from the sample average: tobserved xobserved 0 s n 4. Compare the tobserved with the critical t-values , -t(a/2, n1 and t(a/2, n1) from the ttable to determine if tobserved falls in the Acceptance or in the Rejection region. NOTE: Computer output gives us both the tobserved and the observed level of significance, namely, the p-value. The p-value for this two-sided test is 2P(t > |tobserved|) And the decision making based on p-value is : P-value < a , then, we reject H0, that is decide to take Ha P-value a , then, we conclude H0 11 Right-side and Left-side tests Ha is the hypothesis we intend to establish. Therefore, in applications, other tha twoside tests, there are two common hypotheses: •Right-side test : H 0 : 0 •Left side-test. H 0 : 0 H a : 0 H a : 0 How to choose the test for our need? •If our intension is to find out if the sample mean is much larger than the reference value or not, right-side test should be applied. For example, if the reference value of the brightness of paper, 60, is the minimum. Our goal is to decide if the new process produces significantly brighter paper or not.H 0 : 60 H a : 60 • If our intension is to find out if the sample mean is much lower than the reference value or not, right-side test should be applied. For example, if the reference value of the brightness of paper, 60, is the maximum allowed. Our goal is to decide if the new process produces significantly less bright paper or not. •If our intension is to find out if the sample mean is much lower than the reference value or not, right-side test should be applied. For example, if the reference value of the brightness of paper, 60, is the given standard. Our goal is to decide if the new 12 process produces significantly different brightness of paper or not. Hands-on Activity: Comparative Study with A given Reference In testing the tensile strength of a new type of concrete, the goal is to make sure that the tensile strength meets the minimum of 300 psi. A lab is assigned to test this new concrete. 20 samples are tested. The tensile strengths are : 320 305 293 295 313 306 298 325 304 316 307 308 307 305 319 294 295 295 300 312 Perform an appropriate test to determine if the new type of concrete meets the minimum tensile strength of 300 psi. 13 Comparative Study for Inter-laboratory Testing : two-group cases Using the example of brightness of paper, there are many situations that the testing may involve with two groups of treatment. Here are some possible situations: 1. when chemical component is changed, the brightness could be changed dramatically. A comparative study can be planned to compare the effect of two different levels of this chemical component. 2. When papers are tested by two different labs, there may be between-lab differences. Such difference should be controlled to minimize the systematic error of a given lab when testing the same material using the same testing procedure. 3. When papers are testing using two different testing procedure, it is important to identify the difference between these two testing procedures. A comparative two-group study may be to compare the difference of two types of material, two different treatments , two testing procedures, or difference between two labs. We now discuss a method for making the two-group comparison. Similar to the comparison between a given reference and a sample data, if is important to keep in mind that we need to conduct outlier analysis and distribution checking. 14 The issue of designing experiments for two-sample comparative study Consider the example of comparing the reaction of a chemical component in a lab testing Treatment : Two levels of chemical component. We will discuss two types of designs for experiment: 1. Design A – Paired sample design: The units assigned to two treatment each time are very similar, since they are from the same specimen. Add Level A component Test n = 15 pairs. Each pair are tested together Specimen is split into two sub-samples Add Level B component Add Level A component 2. Design B-Independent sample design: Each treatment is assigned to 15 units, which are independent of the other treatment. Test n = 15 units Test n = 15 units Add Level B component NOTE: a paired-sample comparison is usually referred to Before/After Treatment or Pre/Post Treatment experiment. The variable of interest is observed before and after a treatment. This type of design occurs often in testing the effect of c treatment along the time domain. For example, one my be interested in studying the chemical residue for 5 day, 10 days after the chemical is sprayed to a certain vegetable. 15 NOTE: a paired-sample comparison is usually referred to Before/After Treatment or Pre/Post Treatment experiment. The variable of interest is observed before and after a treatment. This type of design occurs often in testing the effect of c treatment along the time domain. For example, one my be interested in studying the chemical residue for 5 day, 10 days after the chemical is sprayed to a certain vegetable. Time Treatment: Spray the chemical to n randomly chosen subjects. Test the residue five days after from the subjects Test the residue ten days after from the same subjects Treatment is given. Eg, a diet treatment for three months Time Before diet treatment: observe weight, BMI, age, Gender, etc, from each subject Three months after, observe weight, BMI, etc, from the same subject. Hands-on Activity For the same study, one can design a two-independent sample study as well. Design a two independent sample study for studying the chemical residue, and discuss the advantage 16 and disadvantage of paired-sample Vs independent sample designs. The difference between Experiment A and B is: Samples obtained from experiment A can be considered as 15 pairs, each pair is sampled from the sub-group. Possible sources that may introduce the error is the same for two samples except the levels of component. The experimental units are similar. Samples obtained from Experiment B are two independent samples. Each is obtained from the process that is independent from the other process. Possible sources that may introduce errors include not only the levels of components but also the differences of the processes. Therefore, the paper units for testing the brightness may have higher variation. Analyses of data resulted from these twp experiments are different. Experimental A is a paired sample problem, while B is an independent sample problem. Hands-On Activity From the projects you have conducted, identify a paired sample project and one for independent sample project. 17 Analysis of Paired Sample Problem Consider the experiment for testing the chemical residue. Experiment: 15 pots of a certain vegetable are used as the experiment units. The residue is measured and recorded five days and ten days after the spray. X: the residue five days after the chemical treatment. Y: the residue ten days after the chemical treatment. Testing Procedure: Each residue is the average of the residues of two specimen taken from the same plot for the purpose of reducing random error. Pot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 59 61 64 62 59 63 58 59 64 65 64 60 67 65 63 y 54 52 59 60 61 60 56 61 58 59 62 61 61 58 57 d = y-x -5 -9 -5 -2 2 -3 -2 -2 -6 -6 -2 1 -6 -7 -6 For each pot, the residues are observed five days and ten days after. Hence the difference between Y-X is the residue reduction in the five days of time period. To understand if the reduction of residue is statistically significant, we can then perform a one-sample test based on the difference, d. The hypothesis is: H 0 : d 0, H a : d 0 18 Recall: To perform a one-sample t-test, we need: d , sd , SE d The following is the output from Minitab Paired T for 10 days - 5 days N Mean StDev SE Mean 10 days(y) 15 58.600 2.849 0.735 5 days (x) 15 62.200 2.731 0.705 Difference (d) 15 -3.600 3.376 0.872 95% CI for mean difference: (-5.470, -1.730) T-Test of mean difference = 0 (vs not = 0): T-Value = -4.13 P-Value = 0.001 Boxplot of Differences of Residues between 10-days and 5-days •Based on the p-value = .001 < 5%, we can conclude that the residue reduction is statistically significant at a = 5%. The average reduction is 3.6 based on data from 15 pots. (with Ho and 95% t-confidence interval for the mean) •The confidence interval at 95% is given by –5.47 to –1.73. That is the 95% sure that the uncertainty of the residue is d t(.025,14) (SEd ) 3.6 2.145(.872) 3.6 1.87 [ -10 _ X -5 ] Ho 0 Differences 19 Analysis of Two-independent Samples Problem Consider the experiment for testing the chemical residue. We can design a two-independent sample experiment for the residue study. Experiment: 30 pots of a certain vegetable are used as the experiment units. 15 pots are randomly chosen for the 5-day residue testing. The other 15 are for the 10-day residue testing. X: the residue five days after the chemical treatment from 15 randomly selected pots. Y: the residue ten days after the chemical treatment from the other 15 pots. Testing Procedure: Each residue is the average of the residues of two specimen taken from the same plot for the purpose of reducing random error. NOTE: This design is appropriate if each pot can only be applied for one residue testing. Pot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 59 61 64 62 59 63 58 59 64 65 64 60 67 65 63 Pot 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 y 54 52 59 60 61 60 56 61 58 59 62 61 61 58 57 For each pot, the residue can only be measured either five days or ten days after. The assignment of pots to residue testing is random, and thus, there are considered independent. The difference between Y-X no longer reflects the residue reduction, but also include the pots difference. 20 The residue after 5-days is a population with it’s mean 1 and variance, s12. Similarly, the residue after 10-days is a different population with it’s mean 2 and variance, s22. Y X Our purpose is to compare if 2 is statistically lower than 1. This is a left-side test: H 0 : 2 1 0, H a : 2 1 0 Ha is concluded if the corresponding sample mean difference, y x is indeed much lower than zero. How much less from zero is considered significant? Similar to the one-sample problem, we need to determine the distribution of y x Or equivalently, the distribution of the standardized form, ( y x ) / SEY X NOTE: Most of statistical hypothesis problems or estimation problems require the distribution form of the best estimate of the variable of interest. This is usually accomplished by finding the distribution of the standardized best estimate. This is true for any test involves t-distribution, chi-square distribution, as well as Fdistribution, and so on. 21 yx What is the distribution of SE y x ? How to determine ? SEY x Based on statistical theory, the t-distribution holds when the samples are randomly chosen from each population. The quantity SEY x is the uncertainty of the the mean difference. The way for determining SEY x depending on the sample sizes and if the variances of two populations are homogeneous or not. When the population variances are not equal, then SEY x s(2y x ) 2 sx2 s y , therfore, SE ( y x ) n1 n2 is given by: 2 sx2 s y n1 n2 However,the deg rees of freedom for the this uncertainty measurement is a weighted d.f. of n1 and n 2 : 2 s s n1 n2 df = 2 2 2 2 sy sx [ /(n1 1)] [ /(n2 1)] n1 n2 2 x 2 y 22 2 2 2 When the population uncertainties can be assumed equal, that is, s 1 s 2 s we can combine two samples together to obtain a better estimate of the common measurement uncertainty for y x : 1. obtain the pooled estimate of the common variance, s2 , by: (n1 1) s12 (n2 1) s22 s n1 n2 2 2 p SEY X s p 2. Compute SE of : The 100(1-a)% confidence interval for 2 1 1 1 n1 n2 can be determined by: ( y x ) t(a / 2,df ) SEY X 23 To test if population mean 2 statistically different from (greater or less than) the population mean 2. Two-side Test: H 0 : 2 1 0, H a : 2 1 0 Right-side Test: H 0 : 2 1 0, H a : 2 1 0 Left-side Test: H 0 : 2 1 0, H a : 2 1 0 We apply the t-test by: 1. yx t Compute t-value: obs SE Y X 2. Compare tobs with the critical t-value: For two-side test: If t obs falls outside of -t (a / 2,df ) and t (a / 2, df ) , then reject H 0 . For right-side test: If t obs > t (a / 2, df ) , then reject H 0 . For left-side test: If t obs < -t (a / 2,df ) , then reject H 0 . Or when computer software is available, the p-value is used for decision making. The same rule is applied when using p-value, regardless what type of test: If p-value < a, then, reject H0, and conclude Ha 24 Case Example: A chemical residue study Purpose: To compare if chemical residue is significantly reduced ten days after with 5 days after. Experiment: 30 pots of a certain vegetable are used as the experiment units. 15 pots are randomly chosen for the 5-day residue testing. The other 15 are for the 10-day residue testing. X: the residue five days after the chemical treatment from 15 randomly selected pots. Y: the residue ten days after the chemical treatment from the other 15 pots. Testing Procedure: Each residue is the average of the residues of two specimen taken from the same plot for the purpose of reducing random error. Pot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 59 61 64 62 59 63 58 59 64 65 64 60 67 65 63 Pot 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 y 54 52 59 60 61 60 56 61 58 59 62 61 61 58 57 Variable Treatment N Mean Residue Median StDev SE Mean 5-days 15 62.20 63.00 2.731 0.705 10-days 15 58.60 59.00 2.849 0.735 25 Normal Probability Plot of Residues After 10-days .999 .999 .99 .99 .95 .95 Probability Probability Normal Probability Plot of Residue After 5 days .80 .50 .20 .80 .50 .20 .05 .05 .01 .01 .001 .001 58 59 60 61 62 63 64 65 66 67 52 57 5-day Average: 62.2 StDev: 2.73078 N: 15 62 10-day Anderson-Darling Normality Test A-Squared: 0.413 P-Value: 0.296 Average: 58.6 StDev: 2.84856 N: 15 Anderson-Darling Normality Test A-Squared: 0.594 P-Value: 0.101 Test for Equal Variances for Residue 95% Confidence Intervals for Sigmas Factor Levels Diagnosis of assumptions: • • 1 2 Both samples follow normal. 2 3 4 5 F-Test Test Statistic: 0.919 P-Value : 0.877 Variances are similar. Levene's Test Test Statistic: 0.044 P-Value : 0.835 Boxplots of Raw Data 1 2 52 57 62 Residue 67 26 Two-Sample T-Test and CI: Residue, Treatment (Without assume equal variances) Treatment 1 2 N Mean StDev 15 62.20 2.73 15 58.60 SE Mean 2.85 0.71 0.74 Difference = mu (1) - mu (2) Estimate for difference: Note: DF = 27 is computed to adjust the unequal variances 3.60, 95% CI for difference: (1.51, 5.69) T-Test of difference = 0 (vs >): T-Value = 3.53 P-Value = 0.001 DF = 27 Two-Sample T-Test :Residue, Treatment ( assume equal variances) Difference = mu (1) - mu (2) Estimate for difference: 3.60 Note: sp is used as the common s.d. T-Test of difference = 0 (vs >): T-Value = 3.53 P-Value = 0.001 DF = 28 Both use Pooled StDev = 2.79 27 Box Plots for the Residues - Two Independent Samples Residue 67 62.2 62 58.6 57 52 1 2 Treatment Conclusion: The s.d.’s are similar. Levene’s test of uniformity of variances shows p-value = .835. We can use either t-test to test the hypothesis ‘If the residue 10days after is significantly reduced from 5-days after. Two t-test results (assuming/not assuming equal variance) give the same conclusion: P-value < 5%, therefore, the reduction of residue from 5-days to 10-days after the chemical spray is statistically significant. 28 Hands-on Activity Perform the two-independent sample test manually, and compare with the computer output. 29 30