Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Psychometrics wikipedia, lookup

Transcript

Design & Analysis of Experiments 8E 2012 Montgomery STT 511-STT411: DESIGN OF EXPERIMENTS AND ANALYSIS OF VARIANCE Dr. Cuixian Chen Chapter 2 Chapter 2: Some Basic Statistical Concepts 1 Review of STT215: Chapter 3 3.1 Design Of Experiments (Outline of a randomized designs) 3 Completely randomized experimental designs: Individuals are randomly assigned to groups, then the groups are randomly assigned to treatments. Example 3.13, page 179 What are the effects of repeated exposure to an advertising message (digital camera)? The answer may depend on the length of the ad and on how often it is 4 repeated. Outline the design of this experiment with the following information. Subjects: 150 Undergraduate students. Two Factors: length of the commercial (30 seconds and 90 seconds – 2 levels) and repeat times (1, 3, or 5 times – 3 levels) Response variables: their recall of the ad, their attitude toward the camera, and their intention to purchase it. (see page 187 for the diagram.) HWQ: 3.18, 3.30(b),3.32 3.1 Design Of Experiments (Block designs) In a block, or stratified, design, subjects are divided into groups, or blocks, prior to experiments to test hypotheses about differences between the groups. 5 The blocking, or stratification, here is by gender (blocking factor). EX3.19 Ex: 3.17 (p182), 3.18 HWQ: 3.47(a,b), 3.126. 3.1 Design Of Experiments (Matched pairs designs) 6 Matched pairs: Choose pairs of subjects that are closely matched—e.g., same sex, height, weight, age, and race. Within each pair, randomly assign who will receive which treatment. It is also possible to just use a single person, and give the two treatments to this person over time in random order. In this case, the “matched pair” is just the same person at different points in time. The most closely matched pair studies use identical twins. HWQ 3.120 STT511-411: Chapter 2 – Some Basic Statistics Concepts Design of Engineering Experiments Chapter 2 – Some Basic Statistical Concepts 8 Describing sample data Random samples Sample mean, variance, standard deviation Populations versus samples Population mean, variance, standard deviation Estimating parameters Simple comparative experiments The hypothesis testing framework The two-sample t-test Checking assumptions, validity Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Review of one sample inference in Stt215 9 Estimation of Parameters 1 n y yi estimates the population mean n i 1 1 n S ( yi y ) 2 estimates the variance 2 n 1 i 1 2 Sampling Distribution y Z ~ N (0,1), if σ if is known. / n y t ~ t (df n 1), if σ if is unknown. s/ n Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Normal and T distribution 10 When n is very large, s is a very good estimate of and the corresponding t distributions are very close to the normal distribution. The t distributions become wider for smaller sample sizes, reflecting the lack of precision in estimating from s. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 How can we use computer to help us understand the distributions? 11 Introducing R. 1. Where to find? Google “R”->the first link. 2. Download and install CRAN package. 3. Then we have How do we use it? 1. Assign value to a variable x: x=3; or x<-3; 2. A sequence of numbers: 1:5; or 6:3; 3. A vector: x=c(4,5,6); or x=4:6; , then x[2]=5. 4. loop: for (i in 1:5) {print(i)}; 5. Average: mean(x); 6. sum: sum(x); Entry level of R in 10 mins. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Normal distribution in R 12 normal dist in R: dnorm(x, µ, σ), for density; pnorm(x, µ, σ), for left tail probability: Pr(X<=x); qnorm(per, µ, σ), for the quantile: given Pr(X<=x)=per and find x; rnorm(N, µ, σ), for the random number generation. Eg: use R to find the probabilities for N(µ=3, σ =4) 1. P(x<4); 2. P(x>2); 3. P(1<X<4) ; 4. 95th percentile Note: if X ~ N( µ,σ2) , then Z=(X-µ)/σ ~ N(0,1), which is standard normal distribution. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 T distribution in R 13 T-distribution in R: dt(x,df), for density; pt(x,df), for left tail probability: Pr(X<=x); qt(per,df), for the quantile: given Pr(X<=x)=per and find x; rt(N,df), for the random number generation. Eg: use R to find the following probabilities for t(df=6) 1. P(x<4); 2. P(x>2); 3. P(1<X<4) 4. 95th percentile. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 One sample confidence interval and hypothesis testing Review: Confidence levels Confidence intervals contain the population mean in C% of samples. Different areas under the curve give different confidence levels C. z*: z* is related to the chosen confidence level C. C C is the area under the standard normal curve between −z* and z*. The one sample Z-confidence interval is thus: x (z*) n −z* z* Example: For an 80% confidence level C, 80% of the normal curve’s area is contained in the interval. Review: 5 Steps for Hypothesis testing 16 1. State H0 and Ha 2. State the level of significance (Usually α is 5% ). 3. Calculate the test statistic, ASSUMING THE NULL HYPOTHESIS IS TRUE 4. 5. Find the P-value, that is the probability (assuming H0 is true) that the test statistic would take a value as extreme as or more extreme than the actually observed (in the direction of Ha). Draw Conclusion: If P-value ≤ α, then we reject H0 (Enough evidence). If P-value >α, then we do not reject H0 (No Enough evidence). Note: The two possible conclusions are rejecting or not rejecting H0. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 P-value in one-sided and two-sided tests One-sided (onetailed) test Two-sided (twotailed) test To calculate the P-value for a two-sided test, use the symmetry of the normal curve. Find the P-value for a one-sided test, and double it. 17 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Review: Find P-value The P-value is the area under the sampling distribution for values at least as extreme, in the direction of Ha, as that of our random sample. Use R we just learn to find the p-value: e.g. H0 : µ = 2.6 hours verse Ha : µ < 2.6 hours gives test statistic Z=-1.6. Q: Find the p-value. Sampling distribution σ/√n x µ defined by H0 Example 1: One-sample Z-test A test of the null hypothesis H0 : µ = µ0 gives test statistic Z=-1.6 a) What is the P-value if the alternative is Ha : µ > µ0 ? b) What is the P-value if the alternative is Ha : µ < µ0 ? c) What is the P-value if the alternative is Ha : µ ≠ µ0 ? Example 1 (cont.): One-sample Z-test A test of the null hypothesis H0 : µ = µ0 gives test statistic Z=2.1 a) What is the P-value if the alternative is Ha : µ > µ0 ? b) What is the P-value if the alternative is Ha : µ < µ0 ? c) What is the P-value if the alternative is Ha : µ ≠ µ0 ? Example 2: One sample Z-test or One sample Z-Confidence Interval 21 The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 with a population SD=15. The medical director of a company looks at the medical records of 72 company executives in this age group and finds that the mean systolic blood pressure in this sample is 126.07. 1) Is this evidence that executives blood pressures are different from the national average? 2) Find the 95% confidence interval for the average SBP of all company executives. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Example 2: One sample Z-test or One sample Z-Confidence Interval 22 Answer: Hypothesis: H0 : µ = 128 v.s. Ha : µ≠128. Test statistics x 126.07 z 15 α = 5% n 72 x 126.07 128 1.09 n 15 72 P-value= 2*pnorm(-1.09)= 0.2757131. Conclusions: …. The 95% confidence interval for the average SBP of all company executives is: Z*=qnorm(0.975) 15 15 ( 126 . 07 1 . 96 , 126 . 07 1 . 96 ) (122.61, 129.53) , X 1.96 X 1.96 72 72 n n The conclusions from two-sided HT (α=5%) and CI (95%) are consistent Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Example 3: One sample t-test or One sample t-Confidence Interval 23 The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128. The medical director of a company looks at the medical records of 72 company executives in this age group and finds that the mean systolic blood pressure in this sample is 126.07 with sample SD 15. 1) Is this evidence that executives blood pressures are different from the national average? 2) Find the 95% confidence interval for the average SBP of all company executives. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Example 3: One sample t-test or One sample t-Confidence Interval 24 Answer: Hypothesis: H0 : µ = 128 v.s. Ha : µ ≠ 128. α = 5% Test statistics X 126.07 s = 15 n = 72 t X 126.07 128 1.09; df = 72 - 1 = 71 S/ n 15 / 72 P-value= 2*pt(-1.09, df=71)=0.2793988. Conclusions: …. The 95% confidence interval for the average SBP of all company executives is t*=qt(0.975, 71) ( x t ( n1) * S S s s * , x t ( n1) ) ( X 1.99 , X 1.99 ) n n n n (126.07 1.99 15 15 ,126.07 1.99 ) (122.55, 129.59) 72 72 The conclusions from two-sided HT (α=5%) and CI (95%) are consistent Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 One sample test (Shall we use Z-test or T-test ??) 25 Example 4: A new medicine treating cancer was introduced to the market decades ago and the company claimed that on average it will prolong a patient’s life for 5.2 years. Suppose the SD of all cancer patients is 2.52. In a 10 years study with 64 patients, the average prolonged lifetime is 4.6 years. 1) With normality assumption, do the 10-year study’s data show a different average prolonged lifetime? 2) Find the 95% confidence interval for the average prolonged lifetime for all patients. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 One sample test (Shall we use Z-test or T-test ??) 26 Example 5: A new medicine treating cancer was introduced to the market decades ago and the company claimed that on average it will prolong a patient’s life for 5.2 years. In a 10 years study with 20 patients, the average prolonged lifetime is 4.7 years with sample SD 2.50. 1) With normality assumption, do the 10-year study’s data show a different average prolonged lifetime? 2) Find the 95% confidence interval for the average prolonged lifetime for all patients. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Review: Link between confidence level and margin of error for one-sample z-CI: The margin of error depends on z. MOE z Higher confidence C implies a larger margin of error m (thus less precision in our estimates). C A lower confidence level C produces a smaller margin of error m (thus better precision in our estimates). m −z* m z* n Example 6: Finding sample size In a clinical study with certain # of patients, a new medicine can on average prolong 4 years of life. Suppose the SD of all cancer patients is 0.75. Q: How large a sample pf cancer patients would be needed to estimate the mean within ±0.1 years with 90% confidence? Z*=1.645; MOE=(1.645)*(0.75)/sqrt(n)=0.1; so n=(1.645*0.75/0.1)^2=152.2139. We will take n=153. Confidence intervals to test hypotheses For a level a two-sided significance test: Rejects H0: = 0 exactly when the hypothesized value 0 falls outside a level (1-a)100%confidence interval for . In a two-sided test, C = 1 – α. C confidence level α significance level α /2 α /2 One sample test in R 30 One-sample t-test; One-sample Z-test One-sample Z-CI’s and One-sample t-CI’s . For H0: mu=32, v.s. Ha: mu<32 ## suppose that these are the houses prices of 5 randomly selected house from Wilmington x<-c(20,25,28,33,37); mean(x) var(x) sd(x) ## ## We want to test the average house price is less than 32 ## one sample t test by hand in R ########### ## For H0: mu=32, v.s. Ha: mu<32 n<-length(x) t.val<-(mean(x)-32)/(sd(x)/sqrt(n)) df.dat<-n-1; p.value<-pt(t.val,df.dat); print(p.value); ## one sample test by t.test in R ## For H0: mu=32, v.s. Ha: mu<32 t.test(x,alternative="less",mu=32) ############################## ## For H0: mu=32, v.s. Ha: mu>32 t.test(x,alternative="greater",mu=32) ## For H0: mu=32, v.s. Ha: mu≠32 t.test(x,mu=32) #for Ha: mu not equal to 32 ## 95% CI in R ########## LB<-mean(x)-qt(.975,4)*sd(x)/sqrt(n); UB<- mean(x)+qt(.975,4)*sd(x)/sqrt(n); print(c(LB, UB)); Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 One sample test in SAS 31 title 'One-Sample t Test'; proc ttest data=value h0=32 sides=2; /*default*/ data value; /* 3 options for sides 2=not equal */ input value @@; /* L =less than U=greater than*/ var value; datalines; run; 20 25 proc ttest data=value h0=32 sides=U; 28 var value; 33 run; 37 ; proc ttest data=value h0=32 sides=L; run; var value; proc ttest data=value h0=32; run; var value; run; Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Matched-pair sample (one-sample) confidence interval and hypothesis testing Matched pairs t procedures for dependent sample Subjects are matched in “pairs” and outcomes are compared within each unit Example: Pre-test and post-test studies look at data collected on the same sample elements before and after some experiment is performed. Example: Twin studies often try to sort out the influence of genetic factors by comparing a variable between sets of twins. We perform hypothesis testing on the difference in each unit Matched pairs (one-sample) The variable studied becomes Xdifference = (X1 − X2). The null hypothesis of NO difference between the two paired groups. H0: µdifference= 0 ; Ha: µdifference>0 (or <0, or ≠0) When stating the alternative, be careful how you are calculating the difference (after – before or before – after). Conceptually, this is not different from tests on one population. Matched Pairs (one-sample) If we take After – Before, and we want to show that the “After group” has increased over the “Before group” Ha: > 0 “After group” has decreased xdiff diff t diff Ha: < 0 sdiff n The two groups are different Ha: ≠0 Example 4: Matched Pairs t-test Many people believe that the moon influences the actions of some individuals. A study of dementia patients in nursing homes recorded various types of disruptive behaviors every day for 12 weeks. Days were classified as moon days and other days. For each patient the average number of disruptive behaviors was computed for moon days and for other days. The data for 5 subjects whose behavior were classified as aggressive are presented as below: Moon days Other days 3.33 0.27 3.67 0.59 2.67 0.32 3.33 0.19 3.33 1.26 We want to test whether there is any difference in aggressive behavior on moon days and other days. Example 4: Matched Pairs t-test Many people believe that the moon influences the actions of some individuals. A study of dementia patients in nursing homes recorded various types of disruptive behaviors every day for 12 weeks. Days were classified as moon days and other days. For each patient the average number of disruptive behaviors was computed for moon days and for other days. The data for 5 subjects whose behavior were classified as aggressive are presented as below: Moon days Other days Difference 3.33 0.27 3.06 3.67 0.59 3.08 2.67 0.32 2.35 3.33 0.19 3.14 3.33 1.26 2.07 We want to test whether there is any difference in aggressive behavior on moon days and other days. Answer to Example 4 38 Let difference = aggressive behavior on moon days and other days. H 0 : d 0 verses H a : d 0 , a 0.05 t-statistic=12.377, df=5-1=4, p-value=2.449*10^(-4). Reject H0 at 5% level. Enough evidence to conclude that there is any difference in aggressive behavior on moon days and other days Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Does lack of caffeine increase depression? Individuals diagnosed as caffeine-dependent are deprived of caffeine-rich foods and assigned to receive daily pills. Sometimes, the pills contain caffeine and other times they contain a placebo. Depression was assessed. Q: Does lack of caffeine increase depression? There are 2 data points for each subject, but we’ll only look at the difference. The sample distribution appears appropriate for a t-test. Depression Depression Subject with Caffeine with Placebo 1 5 16 2 5 23 3 4 5 4 3 7 5 8 14 6 5 24 7 0 6 8 0 3 9 2 15 10 11 12 11 1 0 Does lack of caffeine increase depression? For each individual in the sample, we have calculated a difference in depression score (placebo minus caffeine). There were 11 “difference” points, thus df = n − 1 = 10. We calculate that x= 7.36; s = 6.92 H0: difference = 0 ; H0: difference > 0 x 0 7.36 t 3.53 s n 6.92 / 11 For df = 10, p-value=0.0027. (1)Since p-value < 0.05, reject H0. Depression Depression Placebo Subject with Caffeine with Placebo Cafeine 1 5 16 11 2 5 23 18 3 4 5 1 4 3 7 4 5 8 14 6 6 5 24 19 7 0 6 6 8 0 3 3 9 2 15 13 10 11 12 1 11 1 0 -1 (2) We have enough evidence to conclude that: Caffeine deprivation causes a significant increase in depression. Two independent samples confidence interval and hypothesis testing Two (independent) sample scenario 42 Portland Cement Formulation (page 26) An engineer is studying the formulation of a Portland cement mortar. He has added a polymer latex emulsion during mixing to determine if this impacts the curing time and tension bond strength of the mortar. The experimenter prepared 10 samples of the original formulation and 10 samples of the modiﬁed formulation. Q: How many factor(s)? How many levels? Factor: mortar formulation; Levels: two different formulations as two treatments or as two levels. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Graphical View of the Data Dot Diagram, Fig. 2.1, pp 26 43 Q: Visually, do you see any difference between these two samples? Q: If yes, do you see large, modest or very small difference? Q: How to compare the difference between these two samples? Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 The Hypothesis Testing Framework for two sample t-test 44 Statistical hypothesis testing is a useful framework for many experimental situations Origins of the methodology date from the early 1900s We will use a procedure known as the two-sample Z-test and two-sample t-test. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Another example: If you have a large sample, a histogram may be useful 45 Graphical description of variability: with 200 observations Noise: called experimental error. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Box Plots, Fig. 2.3, pp. 28 46 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Inferences about the differences in means, Randomized designs 47 The Hypothesis Testing Framework: Sampling from a normal distribution Statistical hypotheses: H : 0 1 2 H1 : 1 2 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 47 Errors in Hypothesis testing 48 If the Null hypothesis is rejected, when it is true, a type I error occurred: α = Pr(Type I error) = Pr(Reject H0 | H0 is true). α is also called significance level. If the Null hypothesis is not rejected, when it is false, a type II error occurred: β = Pr(Type II error) = Pr(Fail to reject H0 | H0 is false) Power = 1- β = Pr(reject H0 | H0 is false) Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Portland Cement Summary Statistics (pg. 38) 49 If we want to test: H 0 : 1 2 H1 : 1 2 Modified Mortar Unmodified Mortar “New recipe” “Original recipe” y1 16.76 y2 17.04 S12 0.100 S22 0.061 S1 0.316 S2 0.248 n1 10 n2 10 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Portland Cement Example 50 If we want to test: H 0 : 1 2 H1 : 1 2 We will consider three cases for this example: Case 1: Assume σ1 and σ2 are known: let σ1 = σ2 = 0.30. Case 2: Assume σ1 and σ2 are unknown, and σ1 = σ2. Case 3: Assume σ1 and σ2 are unknown, and σ1 ≠ σ2. Then Case 1 will give two-sample Z-test; Case 2 will give twosample (pooled) t-test, and Case 3 will give two-sample t-test. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Two-Sample Z-Test: if σ1 and σ2 are known 51 Case 1: Assume σ1 and σ2 are known: let σ1 = σ2 = 0.30. Use the sample means to draw inferences about the population means y1 y2 16.76 17.04 0.28 Difference in sample means Standard deviation of the difference in sample means 2 y 2 n , and 2 y1 y2 = 12 n1 22 n2 , y1 and y2 independent REALLY? (STT315) This suggests a statistic: Z0 y1 y2 12 n1 22 n2 If the variances were known we could use the normal distribution as the basis of a test Z0 has a N(0,1) distribution if the two population means are equal Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 If we knew the two variances how would we use Z0 to test H0? 52 Case 1: Assume σ1 and σ2 are known: let σ1 = σ2 = 0.30. Suppose that σ1 = σ2 = 0.30. Then we can calculate Z0 y1 y2 2 1 n1 2 2 n2 0.28 0.28 2.09 2 2 0.1342 0.3 0.3 10 10 How “unusual” is the value Z0 = -2.09 if the two population means are equal? It turns out that 95% of the area under the standard normal curve (probability) falls between the values Z0.025 = 1.96 and - Z0.025 = 1.96. (that is: the critical value Z*=1.96.) So the value Z0 = -2.09 is pretty unusual in that it would happen less that 5% of the time if the population means were equal Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Standard Normal Table (see appendix) for critical values 53 Critical Value: Z*=qnorm(0.975) =1.959964 Z0.025 = 1.96 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Inferences about the differences in means, Randomized designs Case 1: Assume σ1 and σ2 are known: let σ1 = σ2 = 0.30. 54 So if the variances were known we would conclude that we should reject the null hypothesis at the 5% level of significance H 0 : 1 2 H1 : 1 2 and conclude that the alternative hypothesis is true. This is called a fixed significance level test, because we compare the value of the test statistic to a critical value (1.96) that we selected in advance before running the experiment. The standard normal distribution is the reference distribution for the test. Another way to do this that is very popular is to use the P-value approach. The P-value can be thought of as the observed significance level. For the Z-test, it is easy to find the P-value. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Normal Table Case 1: Assume σ1 and σ2 are known: let σ1 = σ2 = 0.30. 55 Find the probability above Z0 = -2.09 from the table. This is 1 – 0.98169 = 0.01832 Z0.025 = 1.96 The P-value is twice this probability, or 0.03662. So we would reject the null hypothesis at any level of significance that is less than or equal to 0.03662. Typically 0.05 is used as the cutoff. In R, we use 2*pnorm(-2.09) =0.0366178 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Two-sample t –Test if σ1 and σ2 are unknown 56 The two-sample Z-test just described would work perfectly if we knew the two population variances. Since we usually don’t know the true population variances, what would happen if we just plugged in the sample variances? The answer is that if the sample sizes were large enough (say both n> 30 or 40) the Z-test would work just fine. It is a good largesample test for the difference in means. But many times that isn’t possible (as Gosset wrote in 1908, “…but what if the sample size is small…?). It turns out that if the sample size is small we can no longer use the N(0,1) distribution as the reference distribution for the test. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 How the Two-Sample t-Test Works: 57 Case 2: Assume σ1 and σ2 are unknown, and σ1 = σ2. Use S12 and S 22 to estimate 12 and 22 The test statistic is y1 y2 The previous ratio becomes S12y1 S22 y2 t0 n1 1 n2 1 Sp 2 2 2 n However, we have the case where n 2 11 2 Pool the individual sample variances: 2 2 ( n 1) S ( n 1) S 1 2 2 S p2 1 n1 n2 2 df=n1 + n2 - 2 is an estimate of the common variance Or call: Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 How the Two-Sample (pooled) t-Test Works: Case 2: Assume σ1 and σ2 are unknown, and σ1 = σ2. The test statistic is t0 y1 y2 1 1 Sp n1 n2 df=n1 + n2 - 2 The denominator is called the standard error of the difference in means. SE( ). Values of t0 that are very different from zero are consistent with the alternative hypothesis t0 is a “distance” measure-how far apart the averages are expressed in standard deviation units Notice the interpretation of t0 as a signal-to-noise ratio. Chapter 2 Design & Analysis of Experiments 8E 2012 Montgomery 58 The Two-Sample (Pooled) t-Test 59 Case 2: Assume σ1 and σ2 are unknown, and σ1 = σ2. (n1 1) S12 (n2 1) S22 9(0.100) 9(0.061) S 0.081 n1 n2 2 10 10 2 2 p S p 0.284 t0 y1 y2 16.76 17.04 2.20 1 1 1 1 Sp 0.284 n1 n2 10 10 The two sample means are a little over two standard deviations apart Is this a "large" difference? In R: 2*pt(-2.20, 18) = 0.04110859. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Two-Sample (Pooled) t-Test: a more general case 60 Case 2: Assume σ1 and σ2 are unknown, and σ1 = σ2. If we would like to test a more general case, for example: H0: µ1-µ2=10 v.s. H0: µ1-µ2≠10 Then the test statistic will be (see page 43): t0 = Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 William Sealy Gosset (1876, 1937) Gosset's interest in barley cultivation led him to speculate that design of experiments should aim, not only at improving the average yield, but also at breeding varieties whose yield was insensitive (robust) to variation in soil and climate. Developed the t-test (1908) Gosset was a friend of both Karl Pearson and R.A. Fisher, an achievement, for each had a monumental ego and a loathing for the other. Gosset was a modest man who cut short an admirer with the comment that “Fisher would have discovered it all anyway.” 61 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 The Two-Sample (Pooled) t-Test We need an objective basis for deciding how large the test statistic t0 really is. In 1908, W. S. Gosset derived the reference distribution for t0 … called the t distribution. Tables of the t distribution – see textbook appendix page 614. t0 = -2.20 Critical Value: t*=qt(0.975, 18)=2.100922. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 62 Critical Value: t*=qt(0.975, 18) =2.100922. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 63 The Two-Sample (Pooled) t-Test A value of t0 between: –2.101 and 2.101 is consistent with equality of means. It is possible for the means to be equal and t0 to either exceed 2.101 or below –2.101, but it would be a “rare event” … leads to the conclusion that the means are different. Could also use the P-value approach. t0 = -2.20 Critical Value: t*=qt(0.975, 18)=2.100922. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 64 The Two-Sample (Pooled) t-Test t0 = -2.20 Critical Value: t*=qt(0.975, 18)=2.100922. The P-value is the area (probability) in the tails of the t-distribution beyond -2.20 + the probability beyond +2.20 (it’s a two-sided test). The P-value is a measure of how unusual the value of the test statistic is given that the null hypothesis is true. The P-value the risk of wrongly rejecting the null hypothesis of equal means (it measures rareness of the event). The exact P-value in our problem is P = 0.042 (found from a computer). Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 65 Approximating the P-value Our t-table only gives probabilities greater than positive values of t. So take the absolute value of t0 = -2.20 or |t0|= 2.20. Now with 18 degrees of freedom, find the values of t in the table that bracket this value. These are 2.101 < |t0|= 2.20 < 2.552. The right-tail probability for t = 2.101 is 0.025 and for t = 2.552 is 0.01. Double these probabilities because this is a two-sided test. Therefore the P-valuemust lie between these two probabilities, or 0.05 <P-value < 0.02 These are upper and lower bounds on the P-value. We know that the actual P-value is 0.042. Chapter 2 Design & Analysis of Experiments 8E 2012 Montgomery 66 Computer Two-Sample t-Test Results 67 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Checking Assumptions – Normal Probability Plot (called QQ-plot) 68 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Importance of the t-Test 69 Provides an objective framework for simple comparative experiments Could be used to test all relevant hypotheses in a two-level factorial design, because all of these hypotheses involve the mean response at one “side” of the cube versus the mean response at the opposite “side” of the cube. (See page 6-7) Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Confidence Intervals (See pg. 43) 70 Hypothesis testing gives an objective statement concerning the difference in means, but it doesn’t specify “how different” they are. General form of a confidence interval L U where P( L U ) 1 a The 100(1- α)% confidence interval on the difference in two means: y1 y2 ta / 2,n1 n2 2 S p (1/ n1 ) (1/ n2 ) 1 2 y1 y2 ta / 2,n1 n2 2 S p (1/ n1 ) (1/ n2 ) Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Critical Value: t*=qt(0.975, 18) =2.100922. Example, page 43-44: 71 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 The two sample t-test (with unequal variance) Case 3: Assume σ1 and σ2 are unknown, and σ1 ≠ σ2. The degrees of freedom v associated with this variance estimate is approximated using the Welch–Satterthwaite equation 72 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 What if the Two Variances are Different? 73 Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Example 2.1, page 48: 74 Case 3: Assume σ1 and σ2 are unknown, and σ1 ≠ σ2. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Example 2.1, page 48: 75 P-value = pt(-2.7354, 16.1955) =0.007274408. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Two sample t-test in R: Table 2.1 76 Two-sample (pooled) t-test, and two-sample t-test. For H0: mu1=mu2, v.s. Ha: mu1≠mu2 OR: H0: mu_diff = 0, v.s. Ha: mu_diff ≠ 0 ## Two sample t-test in R with t.test############################### data.tab2.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Tension-Bond.TXT ", header=TRUE); x<-data.tab2.1[,1]; y<-data.tab2.1[,2]; t.test(x,y,alternative ="two.sided", mu=0, var.equal = TRUE); t.test(x,y,alternative ="two.sided", mu=0, var.equal = FALSE); var.equal: If TRUE then the pooled variance is used to estimate the variance, otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Check the normality assumption 77 ########## check the normality assumption ###### qqnorm(x); qqline(x); #### what does the qq plot do? ###### par(mfrow=c(2,2)) aa<-rnorm(100) qqnorm(aa); qqline(aa); bb<-rnorm(100,10,2) qqnorm(bb); qqline(bb); cc<-rexp(100) qqnorm(cc); qqline(cc); dev.off(); ## to close the plotting window Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Other Chapter Topics 78 Hypothesis testing when the variances are known. One sample inference (t and Z tests), by comparing to a specific value. Hypothesis tests on variances (F tests). Paired experiments – this is an example of blocking. (chap 4) Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 How do we know whether the variance of two samples are different? 79 So F0 follows F (n1-1,n2-1) distribution. We call n1-1 the numerator df, and n2-1 the denominator df. For two-sided F-test: p-value=2*(one-tail area). But how to decide which side to double? First, find the center of Median by qf(0.5, n1-1, n2-1). If test statistic F0 is on the RIGHT side of the center, then double the RIGHT side. P-value= 2* (1-pf(F0, n1-1, n2-1)) & Analysis Experiments2* 8E pf(F0, 2012 Montgomery Otherwise, doubleDesign the LEFT side.ofP-value= n1-1, n2-1). Chapter 2 Example 2.3, page 58: 80 In R: One-sided test with f0=14.5/10.8; 1-pf(f0,11,9) = 0.3344771. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Portland Cement Formulation: Check the equal variance assumption 81 data.tab2.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Tension-Bond.TXT ", header=TRUE); x<-data.tab2.1[,1]; y<-data.tab2.1[,2]; ########## check the equal variance assumption #### var.x<-var(x); var.y<-var(y); n<-length(x) 2*(1-pf(var.x/var.y,n-1,n-1)) # or var.test(x,y); Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Summary of Ch2 82 Sampling distribution y ~ N (0,1) / n y ~ t (df n 1) s/ n Steps for hypothesis testing One sample Z-test CI for population mean when population SD is given One sample t-test CI for population mean when sample SD is given Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Summary of Ch2 83 Two sample Z-test Two sample t-test with equal variance assumption CI for the difference of two population mean with equal variance assumption Two sample t-test without equal variance assumption. CI for the difference of two population mean with equal variance assumption Check normality assumption Check equal variance assumption with F test Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2 Summary about R 84 pnorm; qnorm; pt; qt; qqnorm; qqline; t.test; var.test; pf; Design & Analysis of Experiments 8E 2012 Montgomery Chapter 2