Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Time series wikipedia , lookup
Regression toward the mean wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Susan Kolakowski Design of Experiments – EQAS 770 Homework #1 March 22, 2006 Problem 1 Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit pattern can be imaged on to the wafer. After application, the coated wafers are baked to remove the solvent in the photoresist mixture and to harden the resist. Here are the measurements of photoresist thickness( in kA) for eight wafers baked at 2 different temperatures. Assume that the runs were made in random order and they are independent. (Problem statement copied from assignment) Temp. 95°C 100°C 11.176 5.263 7.089 6.748 Photoresist Thickness (in kA) 8.097 11.739 11.291 10.759 7.461 7.015 8.133 7.418 6.467 3.772 8.315 8.963 a) Preliminary Analysis For the preliminary analysis, the descriptive statistics were calculated and three plots were produced: boxplot, dotplot and histogram of data. The results of the descriptive statistics calculations were as follows (where N represents the number of samples for each temperature and the mean, standard deviation, minimum, median and maximum are in units of kA): Temperature N Mean 95°C 100°C 8 8 9.367 6.847 Standard Minimum Deviation 2.100 6.467 1.640 3.772 Median Maximum 9.537 7.217 11.739 8.963 By looking at these statistics, it appears that the photoresist thickness may differ depending on which temperature the resisters are baked at. At this stage, we can only hypothesize this due to the fact that the mean values of the 8 samples baked at each of the temperatures is different but since the mean value for 100°C is greater than one standard deviation away from the mean value for 95°C, this seems to be the case. Another observation to make is that the maximum thickness for 100°C is less than the mean for 95°C which also makes it appear that baking temperature affects the thickness of photoresisters. Here we have a boxplot of the data illustrating the spread of the samples at each temperature. You can see from this plot that the entire sample set baked at 100°C has a lower thickness than the median of the sample set baked at 95°C. This again makes it appear that the baking temperature has a significant affect on the photoresistors’ thicknesses. The dotplot is another illustration of the data collected but instead of display statistics of the data, it displays where each data sample falls. In my opinion it is harder to get an idea of the significance of temperature to photoresist thickness using this plot, although you can see that two resisters baked at 100°C were measured to have thicknesses lower than the minimum thickness achieved when baking at 95°C and that four photoresisters baked at 95°C exceeded the maximum thickness achieved when baking at 100°C. This histogram of the two sets of data displays the probability of continuous Normal distributions described by the statistics produced by the 8 samples for each temperature. In this plot, you can again see that the mean for the 8 resisters baked at 95°C is greater than the mean for the 8 resisters baked at 100°C, although this plot does show a fair amount of overlap between the two distributions. Based on only the descriptive statistics and the three plots produced, I would say that it appears that there may be a significant difference between the thickness of photoresisters baked at different temperatures and that it is worthwhile to go forward with this data to see if there is enough evidence to support this difference. b) Check all assumptions needed to perform the analysis: 1. Samples are from Normal distribution. 2. Variance for each temperature is equal. 3. Runs were made in random order and are independent. 1. A probability plot was produced in Minitab to test if the data could be assumed to be Normal: Since the p-values are greater than α=0.05 for both temperatures, there is not enough evidence to say that these two data sets are not Normally distributed. Therefore the assumption that the data is Normal is met. 2. A test to determine if the variances for each temperature could be assumed to be equal was run in Minitab. This test produced the following plot: Since the p-values from both tests (F-test and Levene’s test) are greater than α=0.05, we can safely assume that the variances are equal. There is not enough evidence to reject this assumption. 3. It was given in the problem statement that runs were made in random order and are independent. c) A two sample t-test for equal variances was performed to determine if there was enough evidence to support the claim that there is a difference in the mean thickness of photoresisters baked at 95°C versus 100°C. The assumptions required to perform this test were met as described in part b of this problem. For this test, an α-value of 0.05 was used. The results of the test were produced by Minitab as follows: Two-sample T for Data Labels T=100 T=95 N 8 8 Mean 6.85 9.37 StDev 1.64 2.10 SE Mean 0.58 0.74 Difference = mu (T=100) - mu (T=95) Estimate for difference: -2.52000 95% CI for difference: (-4.54043, -0.49957) T-Test of difference = 0 (vs not =): T-Value = -2.68 P-Value = 0.018 DF = 14 Both use Pooled StDev = 1.8840 Since the p-value produced by this test is less than α=0.05, there is enough evidence to say that the means are not equal. d) The 95% confidence interval for the difference in the means was calculated during the 2-sample t-test performed for part c: (-4.54043, -0.49957) Since the value of 0 does not fall into this confidence interval, there is not enough confidence to say that the difference for the means of the populations could be zero (or that there may not be a difference between the population means). e) The sample size necessary to detect an actual difference in mean thicknesses of 1.5kA with a power of 0.9 (or β-risk of 0.1) was determined in Minitab using a process standard deviation of 1.8 kA and an α-value of 0.05. The results from determining this sample size were: 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 1.8 Difference 1.5 Sample Size 32 Target Power 0.9 Actual Power 0.906801 The sample size is for each group. These results tell us that to detect a difference of 1.5 kA between the means for each temperature, a sample size of 32 photoresisters baked at each temperature is necessary. This value was determined under the assumption that the process variation is 1.8 kA, allowing the maximum β-risk to be 0.1 and using an α-value of 0.05. Problem 2 - ANOVA P-values much greater than α=0.05, not enough evidence to reject hypothesis that the four data sets are all from Normal distributions. Descriptive Statistics: Data Variable Data Labels MT1 MT2 MT3 MT4 N 4 4 4 4 Mean 2971.0 3156.3 2933.8 2666.3 StDev 120.6 136.0 108.3 81.0 Minimum 2865.0 2975.0 2800.0 2600.0 Median 2945.0 3175.0 2942.5 2650.0 Maximum 3129.0 3300.0 3050.0 2765.0 ANOVA check using Minitab One-way ANOVA: Data versus Labels Source Labels Error Total DF 3 12 15 S = 113.3 Level MT1 MT2 N 4 4 SS 489740 153908 643648 MS 163247 12826 R-Sq = 76.09% Mean 2971.0 3156.3 StDev 120.6 136.0 F 12.73 P 0.000 R-Sq(adj) = 70.11% Individual 95% CIs For Mean Based on Pooled StDev ---+---------+---------+---------+-----(------*-----) (-----*-----) MT3 MT4 4 4 2933.8 2666.3 108.3 81.0 (-----*-----) (-----*-----) ---+---------+---------+---------+-----2600 2800 3000 3200 Pooled StDev = 113.3 Problem 3 Two-Sample T-Test and CI: MT1, MT3 Two-sample T for MT1 vs MT3 MT1 MT3 N 4 4 Mean 2971 2934 StDev 121 108 SE Mean 60 54 Difference = mu (MT1) - mu (MT3) Estimate for difference: 37.2500 95% CI for difference: (-160.9986, 235.4986) T-Test of difference = 0 (vs not =): T-Value = 0.46 Both use Pooled StDev = 114.5795 P-Value = 0.662 Estimate for difference: 37.2500 95% CI for difference: (-160.9986, 235.4986) Under assumption that variances for MT1 and MT2 are equal: DF = 6 c) not enough evidence to say that the means for these two techniques are not equal. Problem 4 P-value = all normal Descriptive Statistics: Data Variable Data Labels Compact Full Size Midsize Sub-Compact N 10 10 10 10 Full-size car may have affect Mean 3.900 5.300 3.600 4.100 StDev 2.283 2.452 2.221 1.969 Minimum 1.000 2.000 1.000 1.000 Median 3.500 5.000 3.500 4.000 Maximum 7.000 10.000 7.000 7.000 Here see one outlier for full-size increased mean for full-size and made it appear significant but at same time full-size had no one counts while others had total of 5 1 counts One-way ANOVA: Data versus Labels Source Labels Error Total DF 3 36 39 S = 2.238 Level Compact Full Size Midsize Sub-Compact SS 16.68 180.30 196.98 MS 5.56 5.01 R-Sq = 8.47% N 10 10 10 10 Mean 3.900 5.300 3.600 4.100 Pooled StDev = 2.238 F 1.11 P 0.358 R-Sq(adj) = 0.84% StDev 2.283 2.452 2.221 1.969 Individual 95% CIs For Mean Based on Pooled StDev --+---------+---------+---------+------(----------*-----------) (-----------*-----------) (-----------*-----------) (-----------*-----------) --+---------+---------+---------+------2.4 3.6 4.8 6.0 P greater than alpha=0.1 -> not enough evidence to say that the means are not equal therefore not enough evidence to state that the type of car effects the rental contract Last plot – appears random = good First plot - residuals fit line well – appear normal = good Res vs fit – no pattern = good P-value is low – data appears to move in pattern around fit line = bad But p-value is greater than alpha so there’s not enough evidence to say that the residuals are not normally distributed Test FS vs not FS – sample sizes not equal – just look at plots