Download STAB22 Statistics I Lecture 7 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
STAB22 Statistics I
Lecture 7
1
Example

Newborn babies’ weight follows Normal distr. w/ mean
3500 grams & SD 500 grams. A baby is defined as “high
birth weight” if it is in the top 2% of birth weights. What
weight would make a baby “high birth weight”?
2
Checking Normality

Normal density is theoretical model; when
should we use it to describe real data?

Can check histogram


Look for bell-shape
(unimodal & symmetric)
A better check is Normal Probability plot
3
Normal Probability Plot

Plot data values against their theoretical
Normal z-scores (a.k.a. Normal quantile plot)
StatCrunch:
Graphics > QQplot

If points lie close to straight line → data welldescribed by Normal
4
Example (non-Normal plots)

Right skewed distr.
● Left skewed distr.
convex plot
(U-shaped)
concave plot
(∩-shaped)
5
Relationship Between Two
Quantitative Variables

Consider following student data

Quantitative variables



Weight (in kg)
Height (in cm)
Name
Aubrey
Ron
Carl
⁞
Weight
77
75
70
⁞
Height
188
173
178
⁞
What is relationship between weight & height?

First step is to examine relationship visually using
a scatterplot
6
Scatterplot

Variables measured along horizontal (y-) and
vertical (x-) axis; each dot presents combination
of corresponding individual’s values
(Height=170, Weight=61)
StatCrunch:
7
Graphics > Scatterplot
Role of Variables

Usually there is a variable of interest, called
response / dependent variable, and a variable
whose effect on the response we want to
examine, called explanatoty / independent


Response goes on vertical axis (a.k.a. y-variable) and
Explanatory goes on horizontal axis (a.k.a. x-variable)
E.g. Want to study whether Blood Pressure increases
with Age; how would you classify the variables?
 Response variable:
 Explanatory variable:
8
Types of Relationships

Overall pattern of scatterplot describes form,
direction & strength of relationship

Form:
Linear relationship
● Non-linear relationship
-10
40
-5
45
0
5
50

8.5
9.0
9.5
10.5
11.5
8.5
9.0
9.5
10.5
11.5
9
Types of Relationships

Direction:
Positive relationship
● Negative relationship
0
0
5
10
20
10
30
15
40
20
50

0.0
0.2
0.4
0.6
0.8
1.0
1.2
Y-var. increases with X-var.
and vice-versa
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Y-var. decreases with X-var.
and vice-versa
10
Scatterplot Pattern
Strength:
Strong relationship
● Week relationship
-1
-0.5
0
0.0
1
0.5
2
1.0

-2
-1.0

8
9
10
11
12
data tightly clustered
around pattern
13
8
9
10
11
12
data loosely spread
around pattern,
forming vague cloud
13
11
Outliers
-0.5
0.0
0.5
1.0
Scatterplots also help identify outliers, i.e.
extreme deviations from overall pattern
-1.0

8
9
10
11
12
13
12
Example
10
Describe relationship of variables based on
scatterplot & identify any outliers
0
5
Form:
-5
Direction:
Strength:
-10

-2
-1
0
1
2
3
4
13
Correlation

Correlation coefficient (r): numerical
measure of linear relationship between 2 vars

For variable data (x1,…,xn) & (y1,…,yn), given by
r
  x  x  y  y 
 x  x   y  y 
i
2
i


i
i
2
StatCrunch: Stat >
Summary Stats >
Correlation
r is always a number between −1 and 1
r describes strength & direction of linear
relationships only
14
Interpretation of Correlation
Coefficient
15
Correlation Properties





r>0 →+ve & r<0 →−ve relationship
r magnitude, i.e. distance from 0, describes
the strength of the relationship, i.e. how close
the data are to a line
r does not change when the x and/or y
variables are shifted or rescaled
r is symmetric: doesn’t matter which variable
is on x- or y-axis, r is the same in both cases
r is sensitive to outliers
16
Example
Choose corresponding r for each scatterplot
-4
-2
0
−0.8
−0.4
0.0
+0.4
+0.8
2
4
30
0
-4
-10
5
-2
10
0
0
5
20
10
2
4
20

-4
-2
0
−0.8
−0.4
0.0
+0.4
+0.8
2
4
-4
-2
0
−0.8
−0.4
0.0
+0.4
+0.8
2
17
Correlation vs Causation

If two variables are correlated this does not
necessarily imply that x causes y to change
E.g. Height & Weight
+ly correlated, but Weight
does not cause Height, i.e.
putting on more weight will
not make you taller


(r = +0.8762)
Generally, Correlation ⇏ Causation
18
Correlation vs Causation

Observed correlation/association between
two variables can be result of a third hidden
or lurking variable
ice-cream
sales
# people
drowning
E.g. Ice-cream sales
correlated with drowning,
but both variables are
weather
caused by weather
 When weather is hot, people eat more ice-creams
and do more swimming (& therefore drowning)!

19