Download Notes 3 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Stat 112 – Notes 3
• Homework 1 is due at the beginning of
class next Thursday.
Time to bed last night (hours past 12 noon)
Relationship between Y=Time to
Bed Last Night and X=Cups of
Coffee Drunk Yesterday
17
16
15
14
Data from
our class
13
12
11
10
9
8
-1
0
1
2
3
4
Cups of coffee drunk yesterday
5
6
Simple Linear Regression Analysis
Time to bed last night (hours past 12 noon)
Bivariate Fit of Time to bed last night (hours past 12 noon) By Cups of
17
16
15
14
13
12
11
10
9
8
-1
0
1
2
3
4
5
6
Cups of coffee drunk yesterday
Linear Fit
Time to bed last night (hours past 12 noon) = 13.136827 + 0.4162801 Cups of coffee drunk yesterday
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
0.110231
0.071546
1.491114
Checking the Assumptions of
Simple Linear Regression Model
Linearity and
constant variance
assumption
generally look
reasonable.
Residual
3
1
-1
-3
-5
-1
0
1
2
3
4
5
6
Cups of coffee drunk yesterday
Distributions
Residuals Time to bed last night (hours past 12 noon)
Normality assumption
looks reasonable except
for one very negative
residual
-5
-4
-3
-2
-1
0
1
2
3
4
Overall the simple linear
regression model looks
Reasonable.
Inference
• What is the relationship between X=cups of coffee drunk
and Y=Time to bed for all Penn students?
• Inference: Drawing conclusions from a sample about a
population.
• We can view our sample as a random sample from the
population of Penn students (Are there any problems
with this assumption?)
• Simple linear regression model
• Inference questions:
– Confidence interval for slope – What is a plausible
range of values for the true slope 1 for Penn
students?
– Hypothesis testing: Is there evidence that cups of
coffee drunk is associated with time to bed, i.e., is
there evidence that 1  0 ? Is there evidence that for
each 1 additional cup of coffee drunk, the mean time
to bed increases by at least half an hour, i.e., is there
evidence that 1  0.5?
Confidence Intervals
• Point Estimate: b1
• Confidence interval: range of plausible values for the
true slope 1
• (1   )100%Confidence Interval: b1  t  / 2 sb1
sb1  se
1
2
(n  1) sx
where
is an estimate of the standard
deviation of b1 ( se  RMSE )
Typically we use a 95% CI.
• 95% CI is approximately b1  2* sb1
95% CIs for a parameter are usually approximately
point estimate  2*Standard Error (point estimate)
where the standard error of the point estimate is an
estimate of the standard deviation of the point estimate.
Computing Confidence Interval with
JMP
In the Fit Line output in JMP, information for computing the confidence interval for 1 is
given under Parameter Estimates..
Parameter Estimates
Term
Intercept
Cups of coffee drunk yesterday
Estimate
13.136827
0.4162801
Std Error of Cups of coffee drunk yesterday =
Approximate 95% confidence Interval for
Std Error
0.352227
0.246608
t Ratio
37.30
1.69
Prob>|t|
<.0001
0.1049
sb1
1 : b1  2* sb  0.416  2*.247  (0.078, 0.910)
1
The exact 95% confidence interval can be computed by moving the mouse to the Parameter Estimates, right
clicking, clicking Columns and then clicking Lower 95% and Upper 95%.
Parameter Estimates
Term
Intercept
Cups of coffee drunk yesterday
Exact 95% confidence interval for
Lower 95%
12.40819
-0.093868
1 : (0.094, 0.926)
Upper 95%
13.865465
0.9264284
Hypothesis testing for slope
H 0 : 1 (, , ) 1
*
•
H1 : 1 (, , ) 1 *
*
b


• Test statistic:
t 1 1
sb1
• Reject for (small/large, small, large) values of test
statistic depending on H 0 , H1. See Figure 3.15 for the
decision rules.
• p-value: Measure of how much evidence there is against
the null hypothesis. Large p-values indicate no evidence
against the null hypothesis, small p-values strong
evidence against null.
• Generally accepted rule is to reject H_0 if p-value < 0.05
and not reject H_0 if p-value >=0.05.
Risks of Hypothesis Testing
• Two types of errors are possible in hypothesis
testing:
– Type I error: Reject the null hypothesis when it is true
– Type II error: Accept the null hypothesis when it is
false.
• Probability of Type I error when H0 is true =
significance level of test, denoted by 
• Probability of making correct decision when Ha is
true ( = 1-Prob. of Type II error) = power of test
Hypothesis Testing in the Courtroom
• Null hypothesis: The defendant is innocent
• Alternative hypothesis: The defendant is guilty
• The goal of the procedure is to determine
whether there is enough evidence to conclude
that the alternative hypothesis is true. The
burden of proof is on the alternative hypothesis.
• Two types of errors:
– Type I error: Reject null hypothesis when null
hypothesis is true (convict an innocent defendant)
– Type II error: Do not reject null hypothesis when null
is false (fail to convict a guilty defendant)
Hypothesis Testing in Statistics
• Use test statistic that summarizes information about
parameter in sample.
• Accept H0 if the test statistic falls in a range of values
that would be plausible if H0 were true.
• Reject H0 if the test statistic falls in a range of values that

would be implausible if H0 were
true.
• Choose the rejection region so that the probability of
rejecting H0 if H0 is true equals  (most commonly 0.05)
• p-value: measured of evidence against H0. Small pvalues imply more evidence against H0.
• p-value method for hypothesis tests: Reject H0 if the pvalue is   . Do not reject H0 if p-value is  .
Scale of Evidence Provided by p-value
p-value
Evidence against null
hypothesis
> 0.10
No evidence
0.05 – 0.10
Suggestive, but
inconclusive
0.01 – 0.05
Moderate
< 0.01
Convincing
Hypothesis Tests and Associated pvalues
1. Two-sided test:
H 0 : 1   (or 0   )
*
1
*
0
H a : 1   (or 0   )
Reject H 0 if t  t
 / 2, n  2 or t  t / 2, n  2
*
1
*
0
p-value = Prob>|t| reported in JMP under
parameter estimates.
*
*
H
:



(or



1
0
0)
2. One-sided test I: 0 1
H a : 1   (or 0   )
Reject H 0 if t  t ,n  2
*
1
p-value = (Prob>|t|)/2 if t is negative
1-(Prob>|t|)/2 if t is positive
*
0
Hypothesis Tests and Associated pvalues Continued
2. One-sided test II: H 0 : 1   (or 0   )
H a : 1  1* (or 0  0* )
*
1
*
0
t


t
 ,n 2
Reject H 0 if
p-value = (Prob>|t|)/2 if t is positive
1-(Prob>|t|)/2 if t is negative
Hypothesis Testing in JMP
• JMP output from Fit Line displays the point
estimates of the intercept and slope,
standard errors of the intercept and slope
( sb , sb ), p-values from two-tailed tests of
H 0 : 0  0 and H 0 : 1  0 .
0
1
Is there evidence that cups of coffee drunk yesterday is associated with time to
bed last night?
H 0 : 1  0
H a : 1  0
Test statistic t 
b1  0 0.416  0

 1.69
sb1
0.247
Cutoff value: t0.025,25 2  2.069
Test: |1.69 | 2.069 .
We do not reject H 0 : 1  0 . There is not strong evidence that 1  0
(Note: This does not mean that 1  0 , just that there is not strong evidence
against it.)
Is there evidence that for each 1 additional cup of coffee drunk, the mean time to bed
increases by at least half an hour, i.e., is there evidence that 1  0.5 ?
b  0.5 0.416  0.5
Test statistic t  1

 0.34
sb1
0.247
Cutoff value: t0.05,252  1.714
Test: .34  1.714 .
There is not strong evidence that 1  0.5 .
Summary
• Chapter 3.3: We have developed methods of inference
(confidence intervals, hypothesis tests) for the simple
linear regression model.
• Important Note: These inferences are only correct if the
simple linear regression model assumptions (linearity,
constant variance, normality) are correct. It is important
to check the assumptions. If the assumptions are
approximately correct, then the inferences are
approximately correct.
• Next class: Chapters 3.4-3.5: Assessing the fit of the
regression line and prediction from the regression line.