Download EXAMPLE A recent national survey found that high school students

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
EXAMPLE
A recent national survey found that high school students watched an average (mean) of 6.8 videos
per month. A random sample of 36 college students revealed that the mean number of videos
watched last month was 6.2, with a standard deviation of 0.5. At the .05 significance level, can we
conclude (test) that college students watch fewer videos a month than high school students?
CS-College Students
HS-High School Students
Solution
It is a one tail test because the key word is that college students watch fewer videos a month than
high school students.
Information:
Xbar=6.2
μ=6.8
α=.05
σ=.5
n=36
Step 1: State the hypothesis
Ha: CS<6.8
Ho: CS≥6.8 (High School Students)
Step 2: Significance Level α=.05
Step 3: Test Statistic
Z-Test because n>30
Step 4: Since one tailed test .5-.05=.45 & the closest number on the Z table is .4505
Which yields a Z= 1.65. Will also be on the negative side of the number line.
Rejection criteria : Reject the null hypothesis , if the calculated value
of test statistic is less than the critical value.
Step 5: Calculate Z test result
Z
X 
/ n
=
6.2  6.8
0.5 / 36
Z Test of Hypothesis for the Mean
Data
6.8
Null Hypothesis
=
Level of Significance
0.05
Population Standard Deviation
0.5
Sample Size
36
Sample Mean
6.2
Intermediate Calculations
Standard Error of the Mean
0.083333333
Z Test Statistic
-7.2
Lower-Tail Test
Lower Critical Value
-1.644853627
p-Value
3.01063E-13
Reject the null hypothesis
|
Reject Ho
| Accept Ho
z ---------- (-7.202) ---------- (-1.65) ---------0--------- (1.65) -----|
Critical Value
Conclusion: Since the calculated value of Z is less than the critical value, we
reject the null hypothesis. Thus the sample provide enough evidence to
support the claim that college students watch fewer videos a month than
high school students
SET2
Problem # 2
A recent national survey found that high school students watched
an average (mean) of 5 videos per month. A random sample of 48 college
students revealed that the mean number of videos watched last month was
5.7, with a standard deviation of 0.4. At the .05 significance level, can we
conclude (test) that college students watch fewer videos a month than high
school students?
Use the template I followed in the first problem and show all the problem
information and all 5 steps of the hypothesis testing process. Show a
number line along with your final decision, please.
Answer
Step 1 : The null hypotheses
tested is
H0: CS≥5 (High School Students)
Ha: CS<5
Step 2: Significance Level α=.05
Step 3: Test Statistic
Step 4:
Z-Test because n>30
Critical Value = -1.65 ( Since it a lower tailed test)
Rejection criteria : Reject the null hypothesis , if the calculated value
of test statistic is less than the critical value.
Step 5: Calculate Z test result Z 
X 
= 12.1243
/ n
Details
Z Test of Hypothesis for the Mean
Data
5
Null Hypothesis
=
Level of Significance
0.05
Population Standard Deviation
0.4
Sample Size
48
Sample Mean
5.7
Intermediate Calculations
Standard Error of the Mean
0.057735027
Z Test Statistic
12.12435565
Lower-Tail Test
Lower Critical Value
-1.644853627
p-Value
1
Do not reject the null hypothesis
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.05
0.0
-1.64
0
Z
Here the value of test statistic 12.12435565 falls in the acceptance region.
Conclusion: Fails to reject the null hypothesis. The sample doe not
provide enough evidence to support the claim that those college
students watch fewer videos a month than high school students.
Problem A. At the time she was hired as a server at the Grumney Family
Restaurant, Beth Brigden was told, “You can average more than $20 a day
in tips.” Over the first 35 days she was employed at the restaurant, the mean
daily amount of her tips was $24.85, with a standard deviation of $3.24. At
the .05 significance level, can Ms. Brigden conclude that she is earning an
average of more than $20 in tips? Show all 5 steps
Answer
Step 1 : The null hypotheses
tested is
H0: The mean daily amount of her tips ≤ $ 20
Ha : The mean daily amount of her tips > $ 20
Step 2: Significance Level α=.05
Step 3: Test Statistic
Step 4:
Z-Test because n>30
Critical Value = 1.65 ( Since it a upper tailed test)
Rejection criteria : Reject the null hypothesis , if the calculated value
of test statistic is greater than the critical value.
Step 5: Calculate Z test result Z 
X 
= 8.855860169
/ n
Details
Z Test of Hypothesis for the Mean
Data
20
Null Hypothesis
=
Level of Significance
0.05
Population Standard Deviation
3.24
Sample Size
35
Sample Mean
24.85
Intermediate Calculations
Standard Error of the Mean
0.547659957
Z Test Statistic
8.855860169
Upper-Tail Test
Upper Critical Value
1.644853627
p-Value
0
Reject the null hypothesis
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.05
0.0
0
Z
1.64
Value of test statistic 8.855860169 falls in the critical region
Conclusion: Reject the null hypothesis. The sample provides enough
evidence to support the claim that The mean daily amount of her tips > $ 20
Problem B. According to the local union president, the mean gross income
of plumbers in the Salt Lake City area is normally distributed, with a mean of
$30,000 and a standard deviation of $3,000. A recent investigative reporter
for KYAK TV found, for a sample of 18 plumbers, the mean gross income was
$30,500. At the .10 significance level, is it reasonable to conclude that the
mean income is not equal to $30,000? Show all 5 steps.
Answer
The null hypothesis tested is
H0: Mean income = $30000
The alternative hypothesis is
H0: Mean income ≠ $30000
The test statistic used is
Significance level = 0.10
Student t test ( two tailed ) t 
X 
S/ n
Critical value : ± 1.739606716
Rejection Criteria: Reject the null hypothesis, if the calculated value of test
statistic | t | is greater than the critical value
Details
t Test for Hypothesis of the Mean
Data
30000
Null Hypothesis
=
Level of Significance
0.1
Sample Size
18
Sample Mean
30500
Sample Standard Deviation
3000
Intermediate Calculations
Standard Error of the Mean
707.1067812
Degrees of Freedom
17
t Test Statistic
0.707106781
Two-Tail Test
Lower Critical Value
-1.739606716
Upper Critical Value
1.739606716
p-Value
0.48908054
Do not reject the null hypothesis
Distribution Plot
T, df=17
0.4
Density
0.3
0.2
0.1
0.05
0.0
0.05
-1.74
0
t
1.74
Value of test statistic 0.707106781 falls in the acceptance region
Conclusion : Fails to reject the null hypothesis. The sample does not
support the claim that mean income is not equal to $30,000.
Problem C. Tina Dennis is the comptroller for Meek Industries. She
believes that the current cash-flow problem at Meek is due to the slow
collection of accounts receivable. She believes that more than 60
percent of the accounts are in arrears more than three months. A
random sample of 200 accounts showed that 140 were more than
three months old. At the .05 significance level, can she conclude that
more than 60 percent of the accounts are in arrears for more than
three months?
Answer :
The null hypothesis tested is
H0: The proportion of accounts that are in arrears for more than three
months is less than or equal to 0.60 ( p ≤ 0.60)
The alternative hypothesis is
H1: The proportion of accounts that are in arrears for more than three
months is greater than or equal to 0.60 ( p > 0.60)
Significance level : 0.05 ( Upper tailed Z test)
Test Statistic used is Z 
p  p0
p0 (1  p0 )
n
Rejection Criteria: Reject the null hypothesis, if the calculated value
of test statistic Z is greater than the critical value of Z.
Critical value : 1.645
Details
Z Test of Hypothesis for the Proportion
Data
Null Hypothesis
p=
0.6
Level of Significance
0.05
Number of Successes
140
Sample Size
200
Intermediate Calculations
Sample Proportion
0.7
Standard Error
0.034641016
Z Test Statistic
2.886751346
Upper-Tail Test
Upper Critical Value
1.644853627
p-Value
0.001946209
Reject the null hypothesis
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.05
0.0
0
Z
1.64
Here the value of test Statistic Z (2.886751346) falls in the critical region
Conclusion: Reject the null hypothesis. The sample provide enough
evidence to support the claim that the proportion of accounts that are in
arrears for more than three months is greater than or equal to 0.60.
SET 4 Problem 1. What is the cutoff/critical F value for an ANOVA problem
where the degrees of freedom in the numerator are equal to 6 and the
degrees of freedom for the denominator are equal to 15. Use the 0.01
significance level, and then use the 1 percent F table.
Distribution Plot
F, df1=6, df2=15
0.8
0.7
0.6
Density
0.5
0.4
0.3
0.2
0.1
0.0
0.01
0
X
4.32
Critical value of F (6,15) at 0.01 significance level = 4.318
Problem 2. For the three treatments shown below, conduct an ANOVA
Test. Use the 0.01 significance level. Use Excel and show all 5 steps of the
hypothesis test. If you do not have the Excel Data Analysis tool, use F=21.9
for this problem. If you have the tool, show me the work, and of course it
should match F=21.9.
Treatment 1
8
6
10
9
treatment 2
3
2
4
3
Answer
The null hypothesis tested is
treatment 3
3
4
5
4
H0: There is no significant difference in the mean of the three treatments
H1: There is significant difference in the mean of the three treatments
Significance Level α=0.01
Test Statistic:
F Test (ANOVA)
Rejection Criteria: Reject the null hypothesis, if the calculated value of F is
greater than the critical value of F at 0.05 significance level
Details
SUMMARY
Groups
Treatment 1
Treatment 2
Treatment 3
Count Sum
4
33
4
12
4
16
ANOVA
Source of Variation
Between Groups
Within Groups
Total
Average
8.25
3
4
Variance
2.916667
0.666667
0.666667
SS
df
MS
F
P-value
F crit
62.16667 2 31.08333 21.94118 0.000346 8.021517
12.75
9 1.416667
74.91667 11
Conclusion: Reject the null hypothesis. The sample provide enough
evidence to support the claim that the mean of three treatments are different
Problem 3. The following sample observations were randomly selected.
Determine the coefficient of correlation and the coefficient of determination.
Interpret.
I HAVE THIS ONE BUT WANT TO CHECK IT!
X: 4
Y: 4
5
6
3
5
6 10
7 7
X
4
5
3
6
10
28
Y
4
6
5
7
7
29
X2
16
25
9
36
100
186
Y2
16
36
25
49
49
175
XY
16
30
15
42
70
173
n
n
n
i 1
i 1
n X iYi   X i  Yi
The correlation is given by the formula r 
i 1
2
n
 n

 n 
n X    X i  .n Yi 2    Yi 
i 1
 i 1  i 1
 i 1 
n
2
2
i
where n =5
The calculated value of r = 0.752246
Coefficient of determination = r2 = 0.752246*0.752246 = 0.565874
Thus 56.58% variation in Y can be explained using X as the independent variable .
Problem 4. Determine the regression formal for this data. USE .05
X: 5 5 4 3 6 10 11
Y: 5 6 5 7 7
9 12
Answer
The general form of simple linear regression is Y= a + bX where Y is the dependent
variable and X is the independent variable. a and be are known as the regression
coefficients .They are estimated by the method of least squares. The estimates of a and b
are given by
n
n
bˆ 
 X iYi 
n
 X Y
i 1
i
i
i 1
n
i 1


  Xi 
n
X i2   i 1 

n
i 1
â  Y  bˆ X
n
2
The parameter b measures the impact of unit change in X on the dependent variable Y. It
is the slope of the regression line. The parameter a is the value of Y when X=0. It is
known as the Intercept term
The regression equation can be used to predict the value of Y for a given X. The
predicted value of Y is given by
ˆ
Yˆ  aˆ  bX
The square of correlation between X and Y is known as the coefficient of determination
(R2) . R2 gives the percentage of variation that can be predicated using the regression
equation.
X
Y
X2
Y2
XY
5
5
4
3
6
10
11
44
5
6
5
7
7
9
12
51
25
25
16
9
36
100
121
332
25
36
25
49
49
81
144
409
25
30
20
21
42
90
132
360
The estimated values of a = 2.8144 and b = 0.7113
The regression line is Y=2.8144+0.7113X
Scatter diagram
y = 0.7113x + 2.8144
R2 = 0.7494
14
12
Y
10
8
6
4
2
0
0
2
4
6
8
10
12
X
The significance of regression coefficients can be tested using the student’s t test. The test
ˆi
~ t / 2,n2 .
SE ( ˆi )
The null hypothesis H0:  i  0 is rejected when the test statistic is greater than the critical
statistic used is
t
value. The critical value = 2.57
Intercept
X
Coefficients Standard Error
t Stat
P-value
2.814433
1.267077 2.221201 0.077012
0.71134
0.183985 3.86629 0.011805
Thus the regression coefficient is significant at 0.05 significance level