Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAB22 Statistics I Lecture 7 1 Example Newborn babies’ weight follows Normal distr. w/ mean 3500 grams & SD 500 grams. A baby is defined as “high birth weight” if it is in the top 2% of birth weights. What weight would make a baby “high birth weight”? 2 Checking Normality Normal density is theoretical model; when should we use it to describe real data? Can check histogram Look for bell-shape (unimodal & symmetric) A better check is Normal Probability plot 3 Normal Probability Plot Plot data values against their theoretical Normal z-scores (a.k.a. Normal quantile plot) StatCrunch: Graphics > QQplot If points lie close to straight line → data welldescribed by Normal 4 Example (non-Normal plots) Right skewed distr. ● Left skewed distr. convex plot (U-shaped) concave plot (∩-shaped) 5 Relationship Between Two Quantitative Variables Consider following student data Quantitative variables Weight (in kg) Height (in cm) Name Aubrey Ron Carl ⁞ Weight 77 75 70 ⁞ Height 188 173 178 ⁞ What is relationship between weight & height? First step is to examine relationship visually using a scatterplot 6 Scatterplot Variables measured along horizontal (y-) and vertical (x-) axis; each dot presents combination of corresponding individual’s values (Height=170, Weight=61) StatCrunch: 7 Graphics > Scatterplot Role of Variables Usually there is a variable of interest, called response / dependent variable, and a variable whose effect on the response we want to examine, called explanatoty / independent Response goes on vertical axis (a.k.a. y-variable) and Explanatory goes on horizontal axis (a.k.a. x-variable) E.g. Want to study whether Blood Pressure increases with Age; how would you classify the variables? Response variable: Explanatory variable: 8 Types of Relationships Overall pattern of scatterplot describes form, direction & strength of relationship Form: Linear relationship ● Non-linear relationship -10 40 -5 45 0 5 50 8.5 9.0 9.5 10.5 11.5 8.5 9.0 9.5 10.5 11.5 9 Types of Relationships Direction: Positive relationship ● Negative relationship 0 0 5 10 20 10 30 15 40 20 50 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Y-var. increases with X-var. and vice-versa 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Y-var. decreases with X-var. and vice-versa 10 Scatterplot Pattern Strength: Strong relationship ● Week relationship -1 -0.5 0 0.0 1 0.5 2 1.0 -2 -1.0 8 9 10 11 12 data tightly clustered around pattern 13 8 9 10 11 12 data loosely spread around pattern, forming vague cloud 13 11 Outliers -0.5 0.0 0.5 1.0 Scatterplots also help identify outliers, i.e. extreme deviations from overall pattern -1.0 8 9 10 11 12 13 12 Example 10 Describe relationship of variables based on scatterplot & identify any outliers 0 5 Form: -5 Direction: Strength: -10 -2 -1 0 1 2 3 4 13 Correlation Correlation coefficient (r): numerical measure of linear relationship between 2 vars For variable data (x1,…,xn) & (y1,…,yn), given by r x x y y x x y y i 2 i i i 2 StatCrunch: Stat > Summary Stats > Correlation r is always a number between −1 and 1 r describes strength & direction of linear relationships only 14 Interpretation of Correlation Coefficient 15 Correlation Properties r>0 →+ve & r<0 →−ve relationship r magnitude, i.e. distance from 0, describes the strength of the relationship, i.e. how close the data are to a line r does not change when the x and/or y variables are shifted or rescaled r is symmetric: doesn’t matter which variable is on x- or y-axis, r is the same in both cases r is sensitive to outliers 16 Example Choose corresponding r for each scatterplot -4 -2 0 −0.8 −0.4 0.0 +0.4 +0.8 2 4 30 0 -4 -10 5 -2 10 0 0 5 20 10 2 4 20 -4 -2 0 −0.8 −0.4 0.0 +0.4 +0.8 2 4 -4 -2 0 −0.8 −0.4 0.0 +0.4 +0.8 2 17 Correlation vs Causation If two variables are correlated this does not necessarily imply that x causes y to change E.g. Height & Weight +ly correlated, but Weight does not cause Height, i.e. putting on more weight will not make you taller (r = +0.8762) Generally, Correlation ⇏ Causation 18 Correlation vs Causation Observed correlation/association between two variables can be result of a third hidden or lurking variable ice-cream sales # people drowning E.g. Ice-cream sales correlated with drowning, but both variables are weather caused by weather When weather is hot, people eat more ice-creams and do more swimming (& therefore drowning)! 19