Download P(X>x) or find x such that P(X>x)=0.65).

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
A VERY SHORT REVIEW - FEBRUARY 18th
Here’s a quick review of the first part of the course. By no means is this comprehensive!
1.1 Displaying Distributions with Graphs
• Two types of variables: categorical and quantitative.
• Bar graphs or pie charts show behaviour (distribution) of categorical variables.
• Stemplots and histograms show behaviour (distribution) of quantitative variables.
• When examining a distribution (histogram, density ...) look for shape (symmetric,
left/right skewed, how many modes?), centre and spread, and for any major deviations from the overall shape (outliers?).
• Remember that histograms are never perfect looking, so don’t be too quick to say
that it’s skewed if it’s not perfectly symmetric.
1.2 Describing Distributions with Numbers
• Mean and median report centre of the distribution - how do you choose which one?
• Quartiles, interquartile range and five-number summary (a visual of the five-number
summary is the boxplot).
• Sample variance and standard deviation.
• Which measures are resistant to outliers? Recall the 1.5×IQR rule for identifying an
outlier.
• What happens to these measures under linear transformations?
1.3 Density Curves and Normal Distributions
• The density curve is like the “theoretical” or “model” histogram. We learned more
about these in Chapter 4. It also has a mean, median, quartiles and a standard
deviation. We learned how to calculate these in Chapter 4 (for discrete rvs.)
• In this section we also learned about the Normal distribution, which hasn’t left us
yet. Don’t forget the 68-95-99.7 rule and the two main types of calculations (Find
P (X > x) or find x such that P (X > x) = 0.65).
• Normal quantile plots.
2.1 Scatterplots
• A scatterplot is a visual relationship between two variables - one is the response and
the other is explanatory.
• When looking at a scatterplot, look for form (linear? clusters?), direction (positive or
negative?), strength and look for potential outliers.
2.2 Correlation
• The correlation r measures the strength and direction of the linear association between two variables. We have a formula, but mostly we either use the computer or
the calculator.
• It’s a number between 0 and 1, and is not resistant to outliers.
2.3 Least-Squares Regression
• The least squares regression line is the straight line yb = bb0 + bb1 x that best explains
the y-values observed. Why is it called “least squares”?
• The regression line can be used for prediction, but extrapolation (what is this?) is
risky.
• Given x̄, ȳ, sx , sy , r you can find
bb1 = r sy and bb0 = ȳ − bb1 x.
sx
• The square of the correlation r2 is the fraction of the variance of the response variable that is explained by least squares regression on the explanatory variable.
2.4 Cautions and Correlation and Regression
• Always look at the residuals, yi − ybi , (and their plot) to see if you have a good fit (no
fanning, no pattern), only then can you trust the regression.
• Outliers have a large residual value.
• Look for influential observations (if removed, will change the regression line). These
need not be outliers.
• Correlation does not imply causation! Consider lurking variables! (Remember the
top hat and health example)
4
3
0
1
2
Frequency
5
6
7
Comment on the histogram.
−3
−2
−1
0
1
2
3
Here’s a normal quantile plot for the same data.
●
●
1
●
0
●
●●
●●●
●●
−1
●●
●
●●
●●●
●
●
●
● ●
−3
−2
●
−3
−2
−1
0
Normal Score
1
2
3
2
3
Find the five number summary of this data set:
3 2 0 0 1 1 1 0 1 0.
Comment on the scatterplot and find the LS regression line.
●
6
●
●
5
●
●
●
●
●
●
4
●
●
●
●
●
3
●
●
●
●
●
●
●
2
y
●
●
●
0.0
●
0.5
1.0
1.5
2.0
2.5
x
x̄
ȳ
sx
sy
r
=
=
=
=
=
0.9623
3.846
0.7988
1.281
0.6205
●
6
●
●
5
●
●
●
●
●
●
●
●
●
4
y
●
●
●
3
●
●
●
●
●
●
2
●
●
●
●
0.0
0.5
1.0
1.5
2.0
2.5
2
3
x
●
●
●
●
●
0
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
−1
●
●
●●
●
−3
residuals
1
●
●
0.0
0.5
1.0
1.5
x
2.0
2.5