Download formulas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
MAIN EXPECTATIONS FOR THE EXAM 4 (APRIL 29, 2005 IN CLASS):
 Know how to write the null and alternative hypothesis
 Know how to compute or identify the test statistics
 Know how to make a decision either using the P-value or rejection method
 When you make a decision, know how to write the conclusion.
 Know what possible error that you may have made
 Know how to compute the point estimates
 Know which procedure should be used if the data and question are given
 Know how to interpret the confidence intervals
 Know what are the assumptions and what are you looking for
 Get familiar with the minitab outputs.
FORMULAS
One-sided (One-tailed) test:
Lower tailed (Left-sided)
Upper tailed (Right-sided)
H0: population characteristics  claimed constant value H0: population characteristics  claimed constant value
Ha: population characteristics < claimed constant value Ha: population characteristics > claimed constant value
Two-sided (Two-tailed) test: H0: population characteristics = claimed constant value
Ha: population characteristics  claimed constant value
Significance level,  = P(Type I error) = P(reject H0 when it is true)
 = P(Type II error) = P(fail to reject H0 when it is false)
Power=1- = 1-P(Type II error) = P(reject H0 when it is false)
Hypothesis testing for single population characteristics:
p0 ,

2
0
and  0 are the claimed constants for the population proportion, variance and the standard deviation, respectively.
Characteristics
Population proportion, p
Assumption
Need to check np0  10 and n(1-p0)  10 to be able
use the test statistic
^
Need to check if n is sufficiently large ( n
p  10 and
Population variance, 2 and standard Deviation,

The population of interest is normal, so that X1,
X2, ...,Xn constitutes a random sample from a
normal distribution with parameters  and 2.
^


n1  p   10 ), to use the confidence interval


Test statistics
^
z
Confidence
Interval
_


x

x



i
2
(n  1)  s

2
i 1 

 =
2
2
n
p  p0
p0 (1  p0 ) / n
0
0
100(1-)% large sample confidence interval for p is
Then 100(1-)% confidence interval for 2 is
^

 p  z / 2


 (n  1)  s 2 (n  1)  s 2

, 2
 2
 1 / 2;n 1
  / 2;n 1
^
^
p(1  p)
, p  z / 2
n
^

p(1  p) 
.
n


^
^
Hypothesis testing for two population characteristics:
A.
2
Independent Samples
I. Population characteristics: Difference between two population means, 1-2.

.


0 is the claimed constant.
1 and  2
_
_
are the population means for X's and Y's, respectively. x and
sample means for X's and Y's, respectively. m and n are the sample sizes for X's and Y's, respectively.
the population variances for X's and Y's, respectively.
2
1
y are the
 12 and  22
are
2
2
s and s are the sample variances for X's and Y's, respectively.
Population characteristics
Either with in independent or in dependent samples, 1-2
p1-p2
 12 /  22
Tables used
t or z
z
F
Decision can be made in one of the two ways for both two populations and one population:
a. Let z* or t* be the computed test statistic values.
if test statistics is z if test statistics is t
Lower tailed test P-value = P(z<z*)
P-value = P(t<t*)
Upper tailed test P-value = P(z>z*)
P-value = P(t>t*)
Two-tailed test
P-value = 2P(z>|z*|) P-value = 2P(t > |t*| )
In each case, you can reject H0 if P-value   and fail to reject H0 (accept H0) if P-value > 
b.
Rejection region for level  test:
test statistics is z
test statistics is t
test statistics is F
Test statistics is 
2
Lower tailed
test
Upper tailed
test
z  -z
t  -t;v
F  F1-;m-1,n-1
 2 <  12 ;n1
z  z
t  t;v
F  F;m-1,n-1
 2 >  2;n1
Two- tailed
test
z  -z/2 or z  z/2
t  -t/2;v or t  t/2;v
F  F1-/2;m-1,n-1 or
F  F/2;m-1,n-1
 2 <  12 / 2;n1 or
 2 >  2 / 2;n1
Do not forget that, F1-/2;m-1,n-1 = 1 / F/2;n-1,m-1 for the F-table
Single Factor ANOVA :
Model:
X ij   i   ij
or
treatment).
X ij : observations ,
X ij     i   ij
where i=1,...,I (number of treatments), j=1,...,J (number of observations in each
 i   i   : ith treatment effect.
 i : ith treatment mean,
 ij : errors which are normally distributed with mean, 0 and the constant variance,  2 .
Assumptions:
X ij 's are independent (  ij 's are independent).  ij 's are normally distributed with mean, 0 and the constant variance,
 2 . X ij 's are normally distributed with mean,  i
Hypothesis:
Or
H 0 : 1  ...   I 1   I
H0 :i  0
for all i versus
versus
and the constant variance,
2.
H a : at least one  i   j
for
i j
where
 i .and  j 's are treatment means.
H a :  i  0 for at least one i where  i is the ith treatment effect.
I
Analysis of Variance Table:
n   Ji
and j=1,…,Ji . n is used when different number of samples obtained from each population.
i 1
Source
Df
SS
MS
F
Prob > F
Treatments I-1
SSTr
MSTr = SStr / dftreatment MSTr / MSE P-value
Error
I(J-1) or n-I SSE
MSE = SSE / dferror
Total
IJ-1 or n-1
SSTotal
where df is the degrees of freedom, SS is the sum of squares, MS is the mean square. Reject H 0 if the P-value  or if the test statistics
F > F;I-1,error df. If you reject the null hypothesis, you need to use multiple comparison test such as Tukey-Kramer
Tukey Kramer test is used if the H0 is rejected on the ANOVA table.
Bartlett’s test is used to check the constant variance.
If there is a constant variance, the point estimate for the constant variance is MSE.
Simple Linear Regression and Correlation
Pearson’s Correlation Coefficient (r) :measures the strength & direction of the linear relationship between X and Y. X and Y must be
numerical variables,
H 0 :  XY  0
(the true correlation is zero) versus
The formal test for the correlation has the test statistics
H a :  XY  0
t
r n2
(the true correlation is not zero).
with n-2 degrees of freedom.
1 r2
Minitab gives you the following output for simple linear regression:
parameters
0
and
Y   0  1 x  e ,
where n observations included, the
 1 are constants whose "true" values are unknown and must be estimated from the data.
Predictor (y)
Coef
SE Coef
T
Constant
b0
sb0
b0
sb0
p-value for
Ha : 0  0
Independent(x)
b1
s b1
b1
s b1
p-value for
H a : 1  0
Analysis of Variance
Source
DF
Regression
SS
P
MS
F
1
SSR
MSR=SSR/1
Residual Error
Total
n-2
n-1
SSE
SST
MSE=SSE/(n-2)
F (= MSR/MSE) =
 b
  1
  s
  b1
t2




2
P
MSR/MSE
p-value for
H a : 1  0

 in only simple linear regression


Coefficient of Determination, R2 : Measure what percent of Y's variation is explained by the X variables via the regression model. It
tells us the proportion of SST that is explained by the fitted equation. Note that SSE is the proportion of SST that is not explained by the
model.
SSR
SSE
 1
SST
SST
H 0 : 1  10
R2 
.
Only in simple linear regression,
R 2  r 2 where r is the Pearson’s correlation coefficient.
H a : 1  10
Test statistics:
t
b1   10
s b1
where
 10 is the value slope is compared with.
Decision making can be done by using error degrees of freedom for any other t test we have discussed before (either using the P-value or
the rejection region method). The 100(1-)% confidence interval for
 1 is b1  t / 2;df sb
1