Download SUBJECT: Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Association rule learning wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
the formula for a confidence interval on the difference between means ( M1 - M2) is:
Md (t)( )
where Md = M1 - M2 is the statistic and
is an estimate of
(the standard error of the
difference between means). t depends on the level of confidence desired and on the
degrees of freedom. The estimated standard error,
, is computed assuming that the
variances in the two populations are equal. If the two sample sizes are equal (n1 = n2) then
the population variance 2 (it is the same in both populations) is estimated by using the
following formula:
MSE = (
)/2
where MSE (which stands for mean square error) is an estimate of sigma2. Once MSE is
calculated,
can be computed as follows:
=
1) H0: M1=M2 or M1-M2=0 vs H1: M1>M2
The first step is to compute the means of each group: M1 = 118 and M2 = 115.
Therefore, Md = 118 - 115 = 3.
= 15^2=225 and = 225.
MSE = (225 + 225)/2 =225. From the formula:
=
= SQRT(2*225/2) = 15
We can calculate the t-stat for Md is td= (Md-0)/ Smd=(3-0)/15= 0.2
The degrees of freedom is equal to the degrees of freedom for MSE (MSE is used to
estimate s2). Since MSE is made up of two estimates of s2 (one for each sample), the df
for MSE is the sum of the df for these two estimates. Therefore, the df for MSE is (n -1)
+ (n - 1) = 49 + 49 = 98.A t-table shows that the value of t for a 99%( it’s a one-tailed
distribution) confidence interval for 98 df is t*=2.63.
As td<t*, we can not reject H0 (the average score of students in psychology is equal to
that of students in math) at a 99% level of confidence.
2) M1 = 9.2 and M2 = 8.8. Therefore, Md = 9.2-8.8 = 0.4.
The calculations are only slightly more complicated when the sample sizes are different
(n1 does not equal n2). The first difference in the calculations is that MSE is computed
differently. If the two values of s2 were simply averaged as they are in the case of equal
sample sizes, then the estimate based on the smaller sample size would count as much as
the estimate based on the larger sample size. Instead the formula for MSE is:
MSE = SSE/df
where df is the degrees of freedom and SSE is the sum of squares error and is defined as:
SSE = SSE1 + SSE2
SSE1 =
where the X's are from the first group (sample) and M1 is the mean of the first group.
Similarly,
SSE2=
where the X's are from the second group and M2 is the mean of the second group.
The formula
=
cannot be used without modification since there is not one
value of n but two: (n1 and n2).
The solution is to use the harmonic mean of the two sample sizes for n. The harmonic
mean (nh) of n1 and n2 is:
Therefore the formula for the estimated standard error of the difference between means is:
= 0.3^2=0.09 and =0.1^2=0.01, nh=2/(1/27+1/30)=28.42
the df for MSE is (n1 -1) + (n2 - 1) = 26+29=55
MSE = SSE/df =(0.09+0.01)/55=0.00182. From the formula:
= SQRT(2*0.00182/28.42) = 0.0113
A t table shows that the value of t for a 95%( it’s a two-tailed distribution, so we should
find the t value at 97.5% level) confidence interval for 55 df is t*=2.
All the terms needed to construct the confidence interval have now been computed. The
lower limit (LL) of the interval is:
LL = Md *t=0.4 – 0.0113*2=0.3774.
UL = Md +
*t=0.4 + 0.0113*2=0.4226
Therefore the 95% confidence interval of the true difference of the mean is (0.3774,
0.4226)
3) H0: M1=M2 or M1-M2=0 (they earn the same) vs H1: M1>M2
The first step is to compute the means of each group: M1 = 23800 and M2 = 23750
Therefore, Md = 23800-23750 = 50.
= 300^2=90000 and = 250^2=62500. df= (n1 -1) + (n2 - 1) =15+19 =34,
nh=2/(1/16+1/20)=17.78
MSE= SSE/df = (90000 +62500)/34 =4485.29. From the formula:
= SQRT(2*4485.29/17.78) = 22.46
We can calculate the t-stat for Md is td= (Md-0)/ Smd=(50-0)/22.46= 2.226
.A t-table shows that the value of t for a 95%( it’s a one-tailed distribution) confidence
interval for 34 df is t*=1.695.
As td>t*, we should reject H0: male nurses earn the same as female nurses at a 95% level
of confidence, or we can say that male nurses earn more than female nurses at a 95%
level of confidence.