Download Notes 3 - Wharton Statistics

Stat 112 – Notes 3 • Homework 1 is due at the beginning of class next Thursday. Time to bed last night (hours past 12 noon) Relationship between Y=Time to Bed Last Night and X=Cups of Coffee Drunk Yesterday 17 16 15 14 Data from our class 13 12 11 10 9 8 -1 0 1 2 3 4 Cups of coffee drunk yesterday 5 6 Simple Linear Regression Analysis Time to bed last night (hours past 12 noon) Bivariate Fit of Time to bed last night (hours past 12 noon) By Cups of 17 16 15 14 13 12 11 10 9 8 -1 0 1 2 3 4 5 6 Cups of coffee drunk yesterday Linear Fit Time to bed last night (hours past 12 noon) = 13.136827 + 0.4162801 Cups of coffee drunk yesterday Summary of Fit RSquare RSquare Adj Root Mean Square Error 0.110231 0.071546 1.491114 Checking the Assumptions of Simple Linear Regression Model Linearity and constant variance assumption generally look reasonable. Residual 3 1 -1 -3 -5 -1 0 1 2 3 4 5 6 Cups of coffee drunk yesterday Distributions Residuals Time to bed last night (hours past 12 noon) Normality assumption looks reasonable except for one very negative residual -5 -4 -3 -2 -1 0 1 2 3 4 Overall the simple linear regression model looks Reasonable. Inference • What is the relationship between X=cups of coffee drunk and Y=Time to bed for all Penn students? • Inference: Drawing conclusions from a sample about a population. • We can view our sample as a random sample from the population of Penn students (Are there any problems with this assumption?) • Simple linear regression model • Inference questions: – Confidence interval for slope – What is a plausible range of values for the true slope 1 for Penn students? – Hypothesis testing: Is there evidence that cups of coffee drunk is associated with time to bed, i.e., is there evidence that 1  0 ? Is there evidence that for each 1 additional cup of coffee drunk, the mean time to bed increases by at least half an hour, i.e., is there evidence that 1  0.5? Confidence Intervals • Point Estimate: b1 • Confidence interval: range of plausible values for the true slope 1 • (1   )100%Confidence Interval: b1  t  / 2 sb1 sb1  se 1 2 (n  1) sx where is an estimate of the standard deviation of b1 ( se  RMSE ) Typically we use a 95% CI. • 95% CI is approximately b1  2* sb1 95% CIs for a parameter are usually approximately point estimate  2*Standard Error (point estimate) where the standard error of the point estimate is an estimate of the standard deviation of the point estimate. Computing Confidence Interval with JMP In the Fit Line output in JMP, information for computing the confidence interval for 1 is given under Parameter Estimates.. Parameter Estimates Term Intercept Cups of coffee drunk yesterday Estimate 13.136827 0.4162801 Std Error of Cups of coffee drunk yesterday = Approximate 95% confidence Interval for Std Error 0.352227 0.246608 t Ratio 37.30 1.69 Prob>|t| <.0001 0.1049 sb1 1 : b1  2* sb  0.416  2*.247  (0.078, 0.910) 1 The exact 95% confidence interval can be computed by moving the mouse to the Parameter Estimates, right clicking, clicking Columns and then clicking Lower 95% and Upper 95%. Parameter Estimates Term Intercept Cups of coffee drunk yesterday Exact 95% confidence interval for Lower 95% 12.40819 -0.093868 1 : (0.094, 0.926) Upper 95% 13.865465 0.9264284 Hypothesis testing for slope H 0 : 1 (, , ) 1 * • H1 : 1 (, , ) 1 * * b   • Test statistic: t 1 1 sb1 • Reject for (small/large, small, large) values of test statistic depending on H 0 , H1. See Figure 3.15 for the decision rules. • p-value: Measure of how much evidence there is against the null hypothesis. Large p-values indicate no evidence against the null hypothesis, small p-values strong evidence against null. • Generally accepted rule is to reject H_0 if p-value < 0.05 and not reject H_0 if p-value >=0.05. Risks of Hypothesis Testing • Two types of errors are possible in hypothesis testing: – Type I error: Reject the null hypothesis when it is true – Type II error: Accept the null hypothesis when it is false. • Probability of Type I error when H0 is true = significance level of test, denoted by  • Probability of making correct decision when Ha is true ( = 1-Prob. of Type II error) = power of test Hypothesis Testing in the Courtroom • Null hypothesis: The defendant is innocent • Alternative hypothesis: The defendant is guilty • The goal of the procedure is to determine whether there is enough evidence to conclude that the alternative hypothesis is true. The burden of proof is on the alternative hypothesis. • Two types of errors: – Type I error: Reject null hypothesis when null hypothesis is true (convict an innocent defendant) – Type II error: Do not reject null hypothesis when null is false (fail to convict a guilty defendant) Hypothesis Testing in Statistics • Use test statistic that summarizes information about parameter in sample. • Accept H0 if the test statistic falls in a range of values that would be plausible if H0 were true. • Reject H0 if the test statistic falls in a range of values that  would be implausible if H0 were true. • Choose the rejection region so that the probability of rejecting H0 if H0 is true equals  (most commonly 0.05) • p-value: measured of evidence against H0. Small pvalues imply more evidence against H0. • p-value method for hypothesis tests: Reject H0 if the pvalue is   . Do not reject H0 if p-value is  . Scale of Evidence Provided by p-value p-value Evidence against null hypothesis > 0.10 No evidence 0.05 – 0.10 Suggestive, but inconclusive 0.01 – 0.05 Moderate < 0.01 Convincing Hypothesis Tests and Associated pvalues 1. Two-sided test: H 0 : 1   (or 0   ) * 1 * 0 H a : 1   (or 0   ) Reject H 0 if t  t  / 2, n  2 or t  t / 2, n  2 * 1 * 0 p-value = Prob>|t| reported in JMP under parameter estimates. * * H :    (or    1 0 0) 2. One-sided test I: 0 1 H a : 1   (or 0   ) Reject H 0 if t  t ,n  2 * 1 p-value = (Prob>|t|)/2 if t is negative 1-(Prob>|t|)/2 if t is positive * 0 Hypothesis Tests and Associated pvalues Continued 2. One-sided test II: H 0 : 1   (or 0   ) H a : 1  1* (or 0  0* ) * 1 * 0 t   t  ,n 2 Reject H 0 if p-value = (Prob>|t|)/2 if t is positive 1-(Prob>|t|)/2 if t is negative Hypothesis Testing in JMP • JMP output from Fit Line displays the point estimates of the intercept and slope, standard errors of the intercept and slope ( sb , sb ), p-values from two-tailed tests of H 0 : 0  0 and H 0 : 1  0 . 0 1 Is there evidence that cups of coffee drunk yesterday is associated with time to bed last night? H 0 : 1  0 H a : 1  0 Test statistic t  b1  0 0.416  0   1.69 sb1 0.247 Cutoff value: t0.025,25 2  2.069 Test: |1.69 | 2.069 . We do not reject H 0 : 1  0 . There is not strong evidence that 1  0 (Note: This does not mean that 1  0 , just that there is not strong evidence against it.) Is there evidence that for each 1 additional cup of coffee drunk, the mean time to bed increases by at least half an hour, i.e., is there evidence that 1  0.5 ? b  0.5 0.416  0.5 Test statistic t  1   0.34 sb1 0.247 Cutoff value: t0.05,252  1.714 Test: .34  1.714 . There is not strong evidence that 1  0.5 . Summary • Chapter 3.3: We have developed methods of inference (confidence intervals, hypothesis tests) for the simple linear regression model. • Important Note: These inferences are only correct if the simple linear regression model assumptions (linearity, constant variance, normality) are correct. It is important to check the assumptions. If the assumptions are approximately correct, then the inferences are approximately correct. • Next class: Chapters 3.4-3.5: Assessing the fit of the regression line and prediction from the regression line.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes 3 - Wharton Statistics