Download Midterm Review Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Transcript
Review for the Midterm
What calculations do you need to know about:
• Making normal approximation to histogram.
• Making a normal approximation to compute a probability in Binomial trials.
• Computing a regression prediction.
• Computing the residual standard error. Also
called residual standard deviation. SD of y
for values in a vertical strip – fixed x value:
q
1 − r2sy
NOTE: not in text. See lectures on regression.
• Computing the percent variation explained:
r2 convert to percentage 100r2%.
1
A couple more
• Listing the elements of a small sample space.
• Computing very basic probabilities.
2
What ideas do you need to understand?
• What do mean, median and mode measure?
• What do SD, IQR and range measure?
• Effects of outliers on above.
• Effect of skewness on relation between median and mean.
• What does a histogram represent? Areas
of bars are proportions of data in given
range.
• When is a normal approximation to a histogram likely to be good or bad?
3
• What does correlation measure?
• If we add points to a scatter plot what will
the effect be? (Depends on where they
are added. If they stretch the cloud out
the correlation becomes more extreme, for
instance.)
• What does a regression line do?
• What is the regression effect?
• How to spot potential lurking variables. Confounding.
4
Binomial example question
Q: For a six sided die which is fair the chance
that I roll a three is 1/6. If I roll the die 600
times what is the chance I get fewer than 85
threes?
Solution: You need to recognize Binomial trials:
repeat n = 600 times;
repetitions are independent;
success is getting a three;
chance of success is p = 1/6.
So the mean number of successes is
µ = np = 600(1/6) = 100.
Standard deviation of number of successes is
σ=
q
np(1 − p) =
s
600
15
= 9.13.
66
5
Now compute chance of fewer than 85 successes.
Range is: 84.5 and below. So get
84.5 − 100
z=
= −1.70.
9.13
The area to the left of -1.70 in the normal
tables is 0.0446.
This is the desired approximate probablity.
6
A basic probability question
Q: I toss a 6 sided die and a fair coin. List the
elements of the sample space and their probabilities. Explain where the probabilities come
from.
Solution: For the die there are six possible
outcomes: 1, 2, 3, 4, 5, 6. The coin has two
possible outcomes: H and T. This gives a total
of 12 outcomes in the sample space:
1H 2H 3H 4H 5H 6H 1T 2T 3T 4T 5T 6T
For a fair die each of the six possible results
has the same chance.
These chances must add up to 1 so they are
all 1/6.
For the coin the two chances are 1/2.
7
Coin toss and die throw are independent.
So chance of throwing a 1 and tossing an H is
(1/6) × (1/2) = 1/12
by the multiplication rule.
All 12 possible outcomes have this same chance
1/12.
8
Data based questions
Q A study of 1000 university students shows
they have an average GPA of 2.8 with a standard deviation of 0.35, an average IQ of 115
with a standard deviation of 15. The correlatiion between these two is 0.4.
Q1: About what percentage of these students
have GPAs over 3.6?
Solution: On an exam you need to say: “In
order to answer this question I need to assume
that a histogram for the GPAs of these students would follow the normal curve.”
Or any other words to indicate that the shape
of the histogram is like the shape of a normal
curve.
Then you convert the desired range to standard units; GPA 3.6 converts to
3.6 − 2.8
= 2.29.
z=
0.35
9
Find the area to the right of 2.29.
Area to the left is 0.9890 from Table A.
Area to the right is 1-0.9890 = 0.0110.
Convert to a percentage: 100x0.0110=1.1%.
10
Q2 Predict the GPA of a student whose IQ is
100, the average for the population at large.
Solution Use regression line. GPA is y, IQ is
x. Slope is
b=r
sy
0.35
= 0.00933.
= 0.4
sx
15
Intercept is
a = ȳ − bx̄ = 2.8 − 0.00933 × 115 = 1.727
Prediction is
ŷ = a + bx = 1.727 + 0.00933 × 100 ≈ 2.66
Also ok on exam to use other method.
Convert 100 to standard units:
100 − 115
= −1.
15
Multiply by r to predict GPA will be -0.4 SDs
above average (or 0.4 SDs below).
That is
2.8 − 0.4 ∗ (.35) = 2.66.
11
Q What is the percent variation in GPA explained by IQ?
Solution: This is
r2 = (0.4)2 = 0.16
converted to a percentage this becomes
100r2% = 100(0.4)2 % = 16%.
Q What is the residual standard deviation for
GPA?
Solution: This is the SD of y in a group with
a single fixed x:
q
1 − r2sy =
q
1 − .42 × 0.35 ≈ 0.32.
12
Q: Looking at the midterm grades and final
exam grades in a large class the prof notes that
they have similar means and similar standard
deviations. He notices however that students
who did poorly on the midterm did somewhat
better on average on the final than they had
done on the midterm. He theorizes that poor
midterm marks encourage students to work
harder and decides that in future he will make
his midterms harder. Criticize this conclusion.
Solution: Another possible explanation is the
regression effect.
When a correlation is positive but less than
1 those who score below average on the first
variable are predicted to be below average on
the second variable but not as much, measured
in standard deviations.
So the strong students on the midterm will do
well but not as well on the final and the weak
students will do poorly but not as poorly on
the final.
13
Effects of aggregation and outliers
Q In a class of 100 students I collect the heights
of the students measured in inches. The mean
is 64 inches, the median is 65 inches. The
standard deviation of heights is 4.5 inches. I
then discover that one height was misrecorded
as 6 inches when it should have been 60 inches.
If I correct this measurement and recalculate
the mean, median and SD what happens to
them?
Solution The mean goes up, the median is
unaffected and the SD goes down.
14
Q: One year a university observes a correlation
between the high school GPA and first year
university GPA of 0.4. The next year the high
school GPA cut off for admission to the university is raised. Will the correlation between
high school and university GPA be higher than,
lower than, or about the same as 0.4 other
things being equal.
Solution Imagine the scatterplot of highschool
GPA against university GPA as an upward sloping oval
New standard cuts off the bottom of the oval,
decreasing the correlation.
But university marks may be affected by the
fact that the students have better high school
GPAs.
That is why I put in the “other things being
equal” disclaimer.
15
Q: In a study of GPA and course load a sample
of 200 students are interviewed. A correlation
of 0.3 is discovered between GPA and number
of courses taken per term. The authors argue
that students should be encouraged to take
heavier course loads in order to improve their
GPA. Is the advice reasonable?
Solution: There are certainly other possibilities:
Students who do poorly in their first few terms
may adjust their course loads downward.
Students who don’t need to work may have
more time for school even after taking more
courses.
16