Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lesson 1: Section 12.1 (part 1) objectives Check conditions for performing inference about the slope β of the population (true) regression line. Interpret computer output from a least-squares regression analysis. ACTIVITY: Does Seat Location Matter? Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement, or do better students simply choose to sit in the front? To investigate, an AP Statistics teacher randomly assigned students to seat locations in the classroom for a particular chapter and recorded the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the student was assigned (Row 1 is closest to the front and Row 7 is the farthest away) ACTIVITY: Does Seat Location Matter? Here are the results, including a scatterplot and least-square regression line: Row 1: 76, 77, 94, 99 Row 2: 83, 85, 74, 79 Row 3: 90, 88, 68, 78 Row 4: 94, 72, 101, 70, 79 Row 5: 76, 65, 90, 67, 96 Row 6: 88, 79, 90, 83 Row 7: 79, 76, 77, 63 1.) Identify and Interpret the slope of the least squares regression line in this context. 2.) Explain why it is important to randomly assign the students to seats rather than letting each student choose his or her own seat. 3.) Does the negative slope provide convincing evidence that sitting closer causes higher achievement, or is it plausible that the association is due to the chance variation in the random assignment? Complete the simulation and find out! ACTIVITY: Does Seat Location Matter? Share your “P-values” from each group. Can we make a any conclusions about these results? Inference for Linear Regression Least-squares regression line for the population is called the population regression line (or true regression line) and written in the form y = α+βx (PARAMETER) Least-squares regression line from a sample is called the sample regression line (or estimated regression line) and can be written in the form: yˆ a bx yˆ b0 b1 x (STATISTIC) Every sample will have slightly different slopes and yintercepts due to sampling variation. www.rossmanchance.com/applets Sampling Distribution of b We will talk about taking inference of the slope using confidence intervals and significance tests – these are based on the sampling distribution of b (the slope of the sample regression line) Like any distribution, you can discuss the shape, center, and spread. SHAPE: Is it roughly symmetric or unimodal? Or does a normal probability plot appear linear? CENTER: The mean of all the sample slopes (b) should be an unbiased estimator of the true slope. SPREAD: Standard deviation. We will discuss this later. CONDITIONS for Regression Inference REMEMBER LINER Linear: The actual relationship between x and y is linear. For any fixed value of x, the mean response µ, falls on the population (true) regression line µy = α+βx. (α and β are the unknown parameters) Independent: Individual observations are independent of each other. Normal: For any fixed value of x, the response y varies according to a Normal distribution. Equal variance: The standard deviation of y (call it σ) is the same for all values of x. (σ is usually unknown) Random: The data come from a well-designed random sample or randomized experiment. How to check conditions Linear: Scatter plot: see if the pattern is overall linear. Residual plot: see if the residuals center on the “residual = 0” line at each x-value in the residual plot Independent: Look at how the data were produced and make sure each observation was independent from each other. If the sampling is done without replacement, check the 10% rule. Normal: Make a stemplot, histogram, or Normal probability plot of the residuals and check for skewness or other major departures from Normality Equal variance: Look at the scatter in the residual plot – the values above and below the “residual = 0” should be about the same from the smallest to largest x-value. Random: See if the data were produced by random sampling or a randomized experiment. EXAMPLE Check the conditions for performing inference about the regression model are met. EXAMPLE: Answer LINEAR: The scatterplot shows a weak linear relationship. The residual plot does not show any obvious leftover patterns indicating that this condition has been violated. INDEPENDENT: Students are randomly assigned to seats and were monitored for cheating, so knowing one student’s score should give no additional information about another student’s score. NORMAL: The histogram of the residuals is roughly unimodal and symmetric, and the Normal probability plot is roughly linear. EQUAL VARIANCE: Although there is a different amount of variability in each row in the residual plot, the differences aren’t large and there is no systematic pattern. RANDOM: The students were assigned to seats at random. Because there are no serious violations of the conditions, we should be safe performing inference about the regression model in this setting. Back to ACTIVITY: Does seat location matter Here is the computer output for the least-squares regression analysis on the seating-chart data from the previous Alternate Activity: Predictor Constant Row Coef 85.706 -1.1171 SE Coef 4.239 0.9472 T P 20.22 0.000 -1.18 0.248 Problem: (a) State the equation of the least-squares regression line. Define any variables you would use. (b) Interpret the slope, y-intercept (if possible), and standard deviation of the residuals. (c) Preview: If we performed a significance test, would you find convincing evidence that there was a negative relationship between row number and test score? Back to ACTIVITY: Does seat location matter Here is the computer output for the least-squares regression analysis on the seating-chart data from the previous Alternate Activity: Predictor Coef SE Coef T P Constant 85.706 4.239 20.22 0.000 Row -1.1171 0.9472 -1.18 0.248 S = 10.0673 R-sq = 4.7% R-sq (adj) = 1.3% ANSWER: (a) yˆ 85.706 1.1171x where y hat = predicted score and x = row number (b) Slope: For each additional row from the front of the class, the test score is predicted to go down by 1.1171 points, on average. Y-intercept: A value of x=0 does not make sense because we cannot have 0 rows. Standard deviation of the residuals: When we use the least-squares regression line to predict test score from a student’s row number, we will be off by about 10.0673 points, on average homework Assigned reading: p. 739 - 744 Complete HW problems: p. 759 #1-4 Check answers to odd problems.