* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Normal - Madison Public Schools
Survey
Document related concepts
Transcript
AP Statistics Notes 12.1 The population (true) regression line is µy = α + βx where µy is the mean response for any fixed value of x, α is the intercept, and β is the slope. (α and β in this setting are in no way related to Type I and Type II probabilities!) As we’ve been doing in previous chapters, we will construct a confidence interval to estimate the slope β of the population (true) regression line. This is what you need to know: t Confidence Interval for the Slope of an LSRL: b ± t*SEb Standard error of the slope: SE𝑏 = 𝑠 𝑠 𝑥 √𝑛−1 Conditions* (LINER): Linear Examine the scatterplot to check that the overall pattern is roughly linear. Look for curved patterns in the residual plot. Check to see that the residuals center on the “residual = 0” line at each x-value in the residual plot. Independent Look at how the data were produced. Random sampling and random assignment help ensure the independence of individual observations. If sampling is done without replacement, remember to check 10n < population. Normal Make a stemplot, histogram, or Normal probability plot of the residuals and check for clear skewness or other major departures from Normality. Equal variance Look at the scatter of the residuals above and below the “residual = 0” line in the residual plot. The amount of scatter should be roughly the same from the smallest to the largest x-value. Random See if the data were produced by random sampling or a randomized experiment. Degrees of freedom: n–2 * Because checking the conditions is so time-consuming, the statement “all conditions for inference have been met” is usually given on the exam. Examples: Many people believe that students learn better if they sit closer to the front of the classroom. To investigate, an AP Statistics teacher randomly assigned students to seat locations in his classroom for a particular chapter and recorded the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the students were assigned (row 1 is closest to the front and row 7 is the farthest away). Because there are no serious violations of the conditions, we should be safe performing inference about the regression model in this setting. Here is computer output for the least-squares regression analysis on the seating chart data. Regression Analysis: Score versus Row Predictor Constant Row S = 10.0673 Coef 85.706 -1.1171 SE Coef 4.239 0.9472 R-Sq = 4.7% T 20.22 -1.18 P 0.000 0.248 R-Sq(adj) = 1.3% 1. State the equation of the least-squares regression line. Define any variables you use. 2. Interpret the slope, y intercept (if possible), and standard deviation of the residuals. 3. Identify the standard error of the slope SEb from the computer output. Interpret this value in context. 4. Calculate the 95% confidence interval for the true slope. Show your work. 5. Based on your interval, is there convincing evidence that seat location affects scores?