Download Lecture 15 – Tues., Oct. 28

Lecture 15 – Tues., Oct. 28 • Review example of one-way layout • Simple Linear Regression: – Simple Linear Regression Model, 7.2 – Least Squares Regression Estimation, 7.3.17.3.2, 7.3.4 – Causation, 7.5.3 • Next time: Inference for simple linear regression, 7.3.3, 7.3.5, 7.4. Review of One-way layout • Assumptions of ideal model – All populations have same standard deviation. – Each population is normal – Observations are independent • Planned comparisons: Usual t-test but use all groups to estimate  . If many planned comparisons, use Bonferroni to adjust for multiple comparisons • Test of H 0 : 1  2    I vs. alternative that at least two means differ: one-way ANOVA F-test • Unplanned comparisons: Use Tukey-Kramer procedure to adjust for multiple comparisons. Review Example • A developmental psychologist is interested in the extent to which children’s memory for facts improves as children get older. • Ten children of ages 4, 6, 8 and 10 are randomly selected to participate in the study. • Each child is given a 30 item memory test; the scores are recorded in memorytest.JMP. Regression for memorytest • Let Y = score, X = age. • Each age is a subpopulation. • The regression of Y on X is the mean of Y as a function of the subpopulation X, denoted by  (Y | X ) • Simple linear regression model: {Y | X }  0  1 X 1 = slope = change in mean number of items remembered for each additional year of age = intercept = mean number of items 0 remembered at age 0 Least squares estimates: ˆ0  4.74, ˆ1  1.96 Regression – General Setup • General setup: We have data (yi, xi), i=1,…,n. [Later we will look at setting where we have multiple x’s]. • Y is called the response variable, X is called the explanatory variable. • Regression: the mean of Y given X=x, • Regression model: an ideal formula to approximate the regression {Y | X } • Simple linear regression model:  (Y | X )  0  1 X Uses of Regression Analysis • Description: Describe the association between Y and X, e.g., case study 7.1.1: What is the relationship between the distance from Earth (Y) and the recession velocity of extragalactic nebulae (X)? The relationship can be used to estimate the age of the universe using the theory of the big bang. • Passive prediction. Predict y based on x where you do not plan to manipulate x, e.g., predict today’s stock price based on yesterday’s stock price. • Control. Predict what y will be if you change x, e.g., predict what your earnings will be if you obtain different levels of education. Example (Problem 30) • Studies over the past two decades have shown that activity can affect the reorganization of the human central nervous system. • Psychologists used magnetic source imaging (MSI) to measure neuronal activity in the brains of nine string players and six controls when thumb and fifth finger of left hand were exposed to mild stimulation. • Research hypothesis: String players, who use fingers of left hand extensively, should show different brain behavior (in particular more neuronal activity). Example Continued • Two-sided t-test: p-value = 0.0003, CI = (7.51,18.92), strong evidence that string players have higher neuron activity than controls • More interesting question: How much does neuron activity index increase per extra year of playing the instrument? • Y= neuron activity index, X = years playing. Simple linear regression model:  (Y | X )  0  1 X • What is the interpretation of  0 and 1 here? Ideal Model • Assumptions of ideal simple linear regression model – There is a normally distributed subpopulation of responses for each value of the explanatory variable – The means of the subpopulations fall on a straight-line function of the explanatory variable. – The subpopulation standard deviations are all equal (to  ) – The selection of an observation from any of the subpopulations is independent of the selection of any other observation. Estimating the coefficients • We want to make the predictions of Y based on X as good as possible. The best prediction of Y based on X is {Y | X } • Least Squares Method: Choose coefficients to minimize the sum of squared prediction errors. • Fitted value for observation i is its estimated mean: fiti  ˆ{Y | X i }  ˆ0  ˆ1 X i • Residual for observation is the prediction error of using X to predict Y: resi  yi  fiti • Least squares method: Find estimates that minimize the sum of squared residuals, solution on page 182. Regression Analysis in JMP • Use Analyze, Fit Y by X. Put response variable in Y and explanatory variable in X (make sure X is continuous). • Click on fit line under red triangle next to Bivariate Fit of Y by X. JMP output for example Neuron activity index Bivariate Fit of Neuron activity index By Years playing 30 25 20 15 10 5 0 0 5 10 15 Years playing 20 Linear Fit Linear Fit Neuron activity index = 7.9715909 + 1.0268308 Years playing Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.866986 0.855902 3.025101 15.89286 14 Parameter Estimates Term Intercept Years playing Estimate Std Error t Ratio Prob>|t| 7.9715909 1.206598 6.61 <.0001 1.0268308 0.116105 8.84 <.0001 The standard deviation  •  is the standard deviation in each subpopulation. •  measures the accuracy of predictions from the regression. • If the simple linear regression model holds, then approximately – 68% of the observations will fall within  of the regression line – 95% of the observations will fall within 2 of the regression line Estimating  • Residuals provide basis for an estimate of ˆ   sum of all squared residuals degrees of freedom • Degrees of freedom for simple linear regression = n-2 • If the simple linear regression models holds, then approximately – 68% of the observations will fall within ̂ of the least squares line – 95% of the observations will fall within 2̂ of the least squares line JMP commands • ̂ is found under Summary of Fit and is labeled “Root Mean Square Error” • To look at a plot of residuals versus X, click Plot Residuals under the red triangle next to Linear Fit after fitting the line. • To save the residuals or fitted values (predicted values), click Save Residuals or Save Predicteds under the red triangle next to Linear Fit after fitting the line. Interpolation and Extrapolation • The simple linear regression model makes it possible to draw inference about any mean response, ˆ {Y | X }  ˆ0  ˆ1 X • Interpolation: Drawing inference about mean response for X within range of observed X; strong advantage of regression model is ability to interpolate. • Extrapolation: Drawing inference about mean response for X outside of range of observed X; dangerous. Straight-line model may hold approximately over region of observed X but not for all X. Extrapolation in Memory Test • Y=Score on test of 30 items, X = Age. • Least squares estimates: ˆ{Y | X }  4.74  1.96 X • Predicted Mean of Y at age 0: 4.7 Predicted Mean of Y at age 20: 43.9 Predicted Mean of Y at age 90: 181.1 Difficulties of extrapolation • Mark Twain: “In the space of one hundred and seventy-six years, the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the old Oolitic Silurian period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale return of conjecture out of such a trifling investment of fact.” Cause and Effect? • The regression summarizes the association between the mean response of Y and the value of the explanatory variable X. • No cause and effect relationship can be inferred unless X is randomly assigned to units in a random experiment. • A researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets? Brain activity in string players • Y=neuron activity, X = years playing string instrument • Least squares estimates: ˆ{Y | X }  7.97  1.03 X • Is this a randomized experiment? • What is an alternative explanation for the association between Y and X other than that playing string instruments causes an increase in the neuron activity index?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 15 – Tues., Oct. 28