Download Plot to test if data is normal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Susan Kolakowski
Design of Experiments – EQAS 770
Homework #1
March 22, 2006
Problem 1
Photoresist is a light-sensitive material applied to semiconductor wafers so that the
circuit pattern can be imaged on to the wafer. After application, the coated wafers are
baked to remove the solvent in the photoresist mixture and to harden the resist. Here
are the measurements of photoresist thickness( in kA) for eight wafers baked at 2
different temperatures. Assume that the runs were made in random order and they are
independent. (Problem statement copied from assignment)
Temp.
95°C
100°C
11.176
5.263
7.089
6.748
Photoresist Thickness (in kA)
8.097
11.739 11.291 10.759
7.461
7.015
8.133
7.418
6.467
3.772
8.315
8.963
a) Preliminary Analysis
For the preliminary analysis, the descriptive statistics were calculated and three plots
were produced: boxplot, dotplot and histogram of data.
The results of the descriptive statistics calculations were as follows (where N
represents the number of samples for each temperature and the mean, standard
deviation, minimum, median and maximum are in units of kA):
Temperature
N
Mean
95°C
100°C
8
8
9.367
6.847
Standard
Minimum
Deviation
2.100
6.467
1.640
3.772
Median
Maximum
9.537
7.217
11.739
8.963
By looking at these statistics, it appears that the photoresist thickness may differ
depending on which temperature the resisters are baked at. At this stage, we can only
hypothesize this due to the fact that the mean values of the 8 samples baked at each of
the temperatures is different but since the mean value for 100°C is greater than one
standard deviation away from the mean value for 95°C, this seems to be the case.
Another observation to make is that the maximum thickness for 100°C is less than the
mean for 95°C which also makes it appear that baking temperature affects the
thickness of photoresisters.
Here we have a boxplot of the data illustrating the spread of the samples at each
temperature. You can see from this plot that the entire sample set baked at 100°C has
a lower thickness than the median of the sample set baked at 95°C. This again makes
it appear that the baking temperature has a significant affect on the photoresistors’
thicknesses.
The dotplot is another illustration of the data collected but instead of display statistics
of the data, it displays where each data sample falls. In my opinion it is harder to get
an idea of the significance of temperature to photoresist thickness using this plot,
although you can see that two resisters baked at 100°C were measured to have
thicknesses lower than the minimum thickness achieved when baking at 95°C and
that four photoresisters baked at 95°C exceeded the maximum thickness achieved
when baking at 100°C.
This histogram of the two sets of data displays the probability of continuous Normal
distributions described by the statistics produced by the 8 samples for each
temperature. In this plot, you can again see that the mean for the 8 resisters baked at
95°C is greater than the mean for the 8 resisters baked at 100°C, although this plot
does show a fair amount of overlap between the two distributions.
Based on only the descriptive statistics and the three plots produced, I would say that
it appears that there may be a significant difference between the thickness of
photoresisters baked at different temperatures and that it is worthwhile to go forward
with this data to see if there is enough evidence to support this difference.
b) Check all assumptions needed to perform the analysis:
1. Samples are from Normal distribution.
2. Variance for each temperature is equal.
3. Runs were made in random order and are independent.
1. A probability plot was produced in Minitab to test if the data could be assumed to
be Normal:
Since the p-values are greater than α=0.05 for both temperatures, there is not
enough evidence to say that these two data sets are not Normally distributed.
Therefore the assumption that the data is Normal is met.
2. A test to determine if the variances for each temperature could be assumed to be
equal was run in Minitab. This test produced the following plot:
Since the p-values from both tests (F-test and Levene’s test) are greater than
α=0.05, we can safely assume that the variances are equal. There is not enough
evidence to reject this assumption.
3. It was given in the problem statement that runs were made in random order and
are independent.
c) A two sample t-test for equal variances was performed to determine if there was
enough evidence to support the claim that there is a difference in the mean thickness
of photoresisters baked at 95°C versus 100°C. The assumptions required to perform
this test were met as described in part b of this problem. For this test, an α-value of
0.05 was used.
The results of the test were produced by Minitab as follows:
Two-sample T for Data
Labels
T=100
T=95
N
8
8
Mean
6.85
9.37
StDev
1.64
2.10
SE Mean
0.58
0.74
Difference = mu (T=100) - mu (T=95)
Estimate for difference: -2.52000
95% CI for difference: (-4.54043, -0.49957)
T-Test of difference = 0 (vs not =):
T-Value = -2.68 P-Value = 0.018 DF = 14
Both use Pooled StDev = 1.8840
Since the p-value produced by this test is less than α=0.05, there is enough evidence
to say that the means are not equal.
d) The 95% confidence interval for the difference in the means was calculated during
the 2-sample t-test performed for part c: (-4.54043, -0.49957)
Since the value of 0 does not fall into this confidence interval, there is not enough
confidence to say that the difference for the means of the populations could be zero
(or that there may not be a difference between the population means).
e) The sample size necessary to detect an actual difference in mean thicknesses of 1.5kA
with a power of 0.9 (or β-risk of 0.1) was determined in Minitab using a process
standard deviation of 1.8 kA and an α-value of 0.05.
The results from determining this sample size were:
2-Sample t Test
Testing mean 1 = mean 2 (versus not =)
Calculating power for mean 1 = mean 2 + difference
Alpha = 0.05 Assumed standard deviation = 1.8
Difference
1.5
Sample
Size
32
Target
Power
0.9
Actual Power
0.906801
The sample size is for each group.
These results tell us that to detect a difference of 1.5 kA between the means for each
temperature, a sample size of 32 photoresisters baked at each temperature is
necessary. This value was determined under the assumption that the process variation
is 1.8 kA, allowing the maximum β-risk to be 0.1 and using an α-value of 0.05.
Problem 2 - ANOVA
P-values much greater than α=0.05, not enough evidence to reject hypothesis that the four
data sets are all from Normal distributions.
Descriptive Statistics: Data
Variable
Data
Labels
MT1
MT2
MT3
MT4
N
4
4
4
4
Mean
2971.0
3156.3
2933.8
2666.3
StDev
120.6
136.0
108.3
81.0
Minimum
2865.0
2975.0
2800.0
2600.0
Median
2945.0
3175.0
2942.5
2650.0
Maximum
3129.0
3300.0
3050.0
2765.0
ANOVA check using Minitab
One-way ANOVA: Data versus Labels
Source
Labels
Error
Total
DF
3
12
15
S = 113.3
Level
MT1
MT2
N
4
4
SS
489740
153908
643648
MS
163247
12826
R-Sq = 76.09%
Mean
2971.0
3156.3
StDev
120.6
136.0
F
12.73
P
0.000
R-Sq(adj) = 70.11%
Individual 95% CIs For Mean Based on
Pooled StDev
---+---------+---------+---------+-----(------*-----)
(-----*-----)
MT3
MT4
4
4
2933.8
2666.3
108.3
81.0
(-----*-----)
(-----*-----)
---+---------+---------+---------+-----2600
2800
3000
3200
Pooled StDev = 113.3
Problem 3
Two-Sample T-Test and CI: MT1, MT3
Two-sample T for MT1 vs MT3
MT1
MT3
N
4
4
Mean
2971
2934
StDev
121
108
SE
Mean
60
54
Difference = mu (MT1) - mu (MT3)
Estimate for difference: 37.2500
95% CI for difference: (-160.9986, 235.4986)
T-Test of difference = 0 (vs not =): T-Value = 0.46
Both use Pooled StDev = 114.5795
P-Value = 0.662
Estimate for difference: 37.2500
95% CI for difference: (-160.9986, 235.4986)
Under assumption that variances for MT1 and MT2 are equal:
DF = 6
c) not enough evidence to say that the means for these two techniques are not equal.
Problem 4
P-value = all normal
Descriptive Statistics: Data
Variable
Data
Labels
Compact
Full Size
Midsize
Sub-Compact
N
10
10
10
10
Full-size car may have affect
Mean
3.900
5.300
3.600
4.100
StDev
2.283
2.452
2.221
1.969
Minimum
1.000
2.000
1.000
1.000
Median
3.500
5.000
3.500
4.000
Maximum
7.000
10.000
7.000
7.000
Here see one outlier for full-size increased mean for full-size and made it appear
significant but at same time full-size had no one counts while others had total of 5 1
counts
One-way ANOVA: Data versus Labels
Source
Labels
Error
Total
DF
3
36
39
S = 2.238
Level
Compact
Full Size
Midsize
Sub-Compact
SS
16.68
180.30
196.98
MS
5.56
5.01
R-Sq = 8.47%
N
10
10
10
10
Mean
3.900
5.300
3.600
4.100
Pooled StDev = 2.238
F
1.11
P
0.358
R-Sq(adj) = 0.84%
StDev
2.283
2.452
2.221
1.969
Individual 95% CIs For Mean Based on
Pooled StDev
--+---------+---------+---------+------(----------*-----------)
(-----------*-----------)
(-----------*-----------)
(-----------*-----------)
--+---------+---------+---------+------2.4
3.6
4.8
6.0
P greater than alpha=0.1 -> not enough evidence to say that the means are not equal
therefore not enough evidence to state that the type of car effects the rental contract
Last plot – appears random = good
First plot - residuals fit line well – appear normal = good
Res vs fit – no pattern = good
P-value is low – data appears to move in pattern around fit line = bad
But p-value is greater than alpha so there’s not enough evidence to say that the residuals
are not normally distributed
Test FS vs not FS – sample sizes not equal – just look at plots