Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review for the Midterm What calculations do you need to know about: • Making normal approximation to histogram. • Making a normal approximation to compute a probability in Binomial trials. • Computing a regression prediction. • Computing the residual standard error. Also called residual standard deviation. SD of y for values in a vertical strip – fixed x value: q 1 − r2sy NOTE: not in text. See lectures on regression. • Computing the percent variation explained: r2 convert to percentage 100r2%. 1 A couple more • Listing the elements of a small sample space. • Computing very basic probabilities. 2 What ideas do you need to understand? • What do mean, median and mode measure? • What do SD, IQR and range measure? • Effects of outliers on above. • Effect of skewness on relation between median and mean. • What does a histogram represent? Areas of bars are proportions of data in given range. • When is a normal approximation to a histogram likely to be good or bad? 3 • What does correlation measure? • If we add points to a scatter plot what will the effect be? (Depends on where they are added. If they stretch the cloud out the correlation becomes more extreme, for instance.) • What does a regression line do? • What is the regression effect? • How to spot potential lurking variables. Confounding. 4 Binomial example question Q: For a six sided die which is fair the chance that I roll a three is 1/6. If I roll the die 600 times what is the chance I get fewer than 85 threes? Solution: You need to recognize Binomial trials: repeat n = 600 times; repetitions are independent; success is getting a three; chance of success is p = 1/6. So the mean number of successes is µ = np = 600(1/6) = 100. Standard deviation of number of successes is σ= q np(1 − p) = s 600 15 = 9.13. 66 5 Now compute chance of fewer than 85 successes. Range is: 84.5 and below. So get 84.5 − 100 z= = −1.70. 9.13 The area to the left of -1.70 in the normal tables is 0.0446. This is the desired approximate probablity. 6 A basic probability question Q: I toss a 6 sided die and a fair coin. List the elements of the sample space and their probabilities. Explain where the probabilities come from. Solution: For the die there are six possible outcomes: 1, 2, 3, 4, 5, 6. The coin has two possible outcomes: H and T. This gives a total of 12 outcomes in the sample space: 1H 2H 3H 4H 5H 6H 1T 2T 3T 4T 5T 6T For a fair die each of the six possible results has the same chance. These chances must add up to 1 so they are all 1/6. For the coin the two chances are 1/2. 7 Coin toss and die throw are independent. So chance of throwing a 1 and tossing an H is (1/6) × (1/2) = 1/12 by the multiplication rule. All 12 possible outcomes have this same chance 1/12. 8 Data based questions Q A study of 1000 university students shows they have an average GPA of 2.8 with a standard deviation of 0.35, an average IQ of 115 with a standard deviation of 15. The correlatiion between these two is 0.4. Q1: About what percentage of these students have GPAs over 3.6? Solution: On an exam you need to say: “In order to answer this question I need to assume that a histogram for the GPAs of these students would follow the normal curve.” Or any other words to indicate that the shape of the histogram is like the shape of a normal curve. Then you convert the desired range to standard units; GPA 3.6 converts to 3.6 − 2.8 = 2.29. z= 0.35 9 Find the area to the right of 2.29. Area to the left is 0.9890 from Table A. Area to the right is 1-0.9890 = 0.0110. Convert to a percentage: 100x0.0110=1.1%. 10 Q2 Predict the GPA of a student whose IQ is 100, the average for the population at large. Solution Use regression line. GPA is y, IQ is x. Slope is b=r sy 0.35 = 0.00933. = 0.4 sx 15 Intercept is a = ȳ − bx̄ = 2.8 − 0.00933 × 115 = 1.727 Prediction is ŷ = a + bx = 1.727 + 0.00933 × 100 ≈ 2.66 Also ok on exam to use other method. Convert 100 to standard units: 100 − 115 = −1. 15 Multiply by r to predict GPA will be -0.4 SDs above average (or 0.4 SDs below). That is 2.8 − 0.4 ∗ (.35) = 2.66. 11 Q What is the percent variation in GPA explained by IQ? Solution: This is r2 = (0.4)2 = 0.16 converted to a percentage this becomes 100r2% = 100(0.4)2 % = 16%. Q What is the residual standard deviation for GPA? Solution: This is the SD of y in a group with a single fixed x: q 1 − r2sy = q 1 − .42 × 0.35 ≈ 0.32. 12 Q: Looking at the midterm grades and final exam grades in a large class the prof notes that they have similar means and similar standard deviations. He notices however that students who did poorly on the midterm did somewhat better on average on the final than they had done on the midterm. He theorizes that poor midterm marks encourage students to work harder and decides that in future he will make his midterms harder. Criticize this conclusion. Solution: Another possible explanation is the regression effect. When a correlation is positive but less than 1 those who score below average on the first variable are predicted to be below average on the second variable but not as much, measured in standard deviations. So the strong students on the midterm will do well but not as well on the final and the weak students will do poorly but not as poorly on the final. 13 Effects of aggregation and outliers Q In a class of 100 students I collect the heights of the students measured in inches. The mean is 64 inches, the median is 65 inches. The standard deviation of heights is 4.5 inches. I then discover that one height was misrecorded as 6 inches when it should have been 60 inches. If I correct this measurement and recalculate the mean, median and SD what happens to them? Solution The mean goes up, the median is unaffected and the SD goes down. 14 Q: One year a university observes a correlation between the high school GPA and first year university GPA of 0.4. The next year the high school GPA cut off for admission to the university is raised. Will the correlation between high school and university GPA be higher than, lower than, or about the same as 0.4 other things being equal. Solution Imagine the scatterplot of highschool GPA against university GPA as an upward sloping oval New standard cuts off the bottom of the oval, decreasing the correlation. But university marks may be affected by the fact that the students have better high school GPAs. That is why I put in the “other things being equal” disclaimer. 15 Q: In a study of GPA and course load a sample of 200 students are interviewed. A correlation of 0.3 is discovered between GPA and number of courses taken per term. The authors argue that students should be encouraged to take heavier course loads in order to improve their GPA. Is the advice reasonable? Solution: There are certainly other possibilities: Students who do poorly in their first few terms may adjust their course loads downward. Students who don’t need to work may have more time for school even after taking more courses. 16