Download JOINT AND CONDITIONAL DISTRIBUTIONS

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
RANDOM VARIABLES,
EXPECTATIONS,
VARIANCES ETC.
1
Variable
• Recall:
• Variable: A characteristic of population or
sample that is of interest for us.
• Random variable: A function defined on the
sample space S that associates a real number
with each outcome in S.
2
DISCRETE RANDOM VARIABLES
• If the set of all possible values of a r.v. X is a
countable set, then X is called discrete r.v.
• The function f(x)=P(X=x) for x=x1,x2, … that
assigns the probability to each value x is called
probability density function (p.d.f.) or
probability mass function (p.m.f.)
3
Example
• Discrete Uniform distribution:
1
P(X  x )  ; x  1,2,..., N; N  1,2,...
N
• Example: throw a fair die.
P(X=1)=…=P(X=6)=1/6
4
CONTINUOUS RANDOM VARIABLES
• When sample space is uncountable
(continuous)
• Example: Continuous Uniform(a,b)
1
f (X) 
ba
a  x  b.
5
CUMULATIVE DENSITY FUNCTION
(C.D.F.)
• CDF of a r.v. X is defined as F(x)=P(X≤x).
• Note that, P(a<X ≤b)=F(b)-F(a).
• A function F(x) is a CDF for some r.v. X iff it
satisfies
lim
x  
lim
x 
lim
F( x )  1
h 0 
a b
F( x )  0
F( x  h )  F( x )
implies
F(x) is continuous from right
F(a )  F( b)
F(x) is non-decreasing.
6
Example
•
•
•
•
Consider tossing three fair coins.
Let X=number of heads observed.
S={TTT, TTH, THT, HTT, THH, HTH, HHT, HHH}
P(X=0)=P(X=3)=1/8; P(X=1)=P(X=2)=3/8
x
F(x)
(-∞,0)
0
[0,1)
1/8
[1,2)
1/2
[2,3)
7/8
[3, ∞)
1
7
Example
3
f
(
x
)

2
(
1

x
)
for x  0
• Let
 x 2(1  t ) 3 dt  1  (1  x )  2 for x  0

F( x )  P(X  x )   0
0
for x  0
P(0.4  X  0.45)  
0.45
0.4
f ( x )dx  F(0.45)  F(0.4)  0.035
8
JOINT DISTRIBUTIONS
• In many applications there are more than one
random variables of interest, say X1, X2,…,Xk.
JOINT DISCRETE DISTRIBUTIONS
• The joint probability mass function (joint pmf)
of the k-dimensional discrete rv
X=(X1, X2,…,Xk) is
f x1, x 2 ,..., x k   PX1  x1, X2  x 2 ,..., Xk  x k 
 x1, x 2 ,..., x k  of X .
9
JOINT DISCRETE DISTRIBUTIONS
• A function f(x1, x2,…, xk) is the joint pmf for
some vector valued rv X=(X1, X2,…,Xk) iff the
following properties are satisfied:
f(x1, x2,…, xk) 0 for all (x1, x2,…, xk)
and
 ... f x1, x 2 ,..., x k   1.
x1
xk
10
Example
• Tossing two fair dice  36 possible sample
points
• Let X: sum of the two dice;
Y: |difference of the two dice|
• For e.g.:
– For (3,3), X=6 and Y=0.
– For both (4,1) and (1,4), X=5, Y=3.
11
Example
• Joint pmf of (x,y)
x
2
0
1
y
2
3
3
1/36
4
5
1/36
1/18
6
7
1/36
1/18
1/18
1/18
1/18
5
9
1/36
1/18
4
8
1/18
1/18
11
1/36
1/18
1/18
10
12
1/36
1/18
1/18
1/18
1/18
1/18
Empty cells are equal to 0.
e.g. P(X=7,Y≤4)=f(7,0)+f(7,1)+f(7,2)+f(7,3)+f(7,4)=0+1/18+0+1/18+0=1/9
12
MARGINAL DISCRETE
DISTRIBUTIONS
• If the pair (X1,X2) of discrete random variables
has the joint pmf f(x1,x2), then the marginal
pmfs of X1 and X2 are
f1  x1    f  x1 , x2  and f 2  x2    f  x1 , x2 
x2
x1
13
Example
• In the previous example,
5
– P(X  2)   P(X  2, y)  P(X  2, y  0) ...  P(X  2, y  5)  1 / 36
y 0
–
P(Y  2) 
12
 P( x, Y  2)  4 / 18
x 2
14
JOINT DISCRETE DISTRIBUTIONS
• JOINT CDF:
Fx1, x 2 ,..., x k   PX1  x1,..., Xk  x k .
• F(x1,x2) is a cdf iff
lim Fx1, x 2   F , x 2   0, x 2 .
x1  
lim
x 2  
Fx1, x 2   Fx1,   0, x1.
lim Fx1, x 2   F,    1
x1 
x 2 
P(a  X1  b, c  X 2  d)  Fb, d   Fb, c   Fa , d   Fa , c   0,  a  b and c  d.
lim Fx1  h, x 2   lim Fx1, x 2  h   Fx1, x 2 ,  x1 and x2 .
h 0 
h 0 
15
JOINT CONTINUOUS DISTRIBUTIONS
• A k-dimensional vector valued rv X=(X1,
X2,…,Xk) is said to be continuous if there is a
function f(x1, x2,…, xk), called the joint
probability density function (joint pdf), of X,
such that the joint cdf can be given as
Fx1, x 2 ,..., x k  
x1 x 2
xk
  ...  f t1, t 2 ,..., t k dt1dt 2 ...dt k
  
16
JOINT CONTINUOUS DISTRIBUTIONS
• A function f(x1, x2,…, xk) is the joint pdf for
some vector valued rv X=(X1, X2,…,Xk) iff the
following properties are satisfied:
f(x1, x2,…, xk) 0 for all (x1, x2,…, xk)
and


 
 

...
 f x1, x 2 ,..., x k dx1dx 2 ...dx k
 1.

17
JOINT CONTINUOUS DISTRIBUTIONS
• If the pair (X1,X2) of discrete random variables
has the joint pdf f(x1,x2), then the marginal
pdfs of X1 and X2 are




f1  x1    f  x1 , x2 dx2 and f 2  x2    f  x1 , x2 dx1 .
18
JOINT DISTRIBUTIONS
• If X1, X2,…,Xk are independent from each
other, then the joint pdf can be given as
f x1, x 2 ,..., x k   f x1 f x 2 ...f x k 
And the joint cdf can be written as
Fx1, x 2 ,..., x k   Fx1 Fx 2 ...Fx k 
19
CONDITIONAL DISTRIBUTIONS
• If X1 and X2 are discrete or continuous random
variables with joint pdf f(x1,x2), then the
conditional pdf of X2 given X1=x1 is defined by
f x1, x 2 
f x 2 x1  
, x1 such that f x1   0, 0 elsewhere.
f x1 
• For independent rvs,
f x2 x1   f  x2 .
f x1 x2   f  x1 .
20
Example
Statistical Analysis of Employment Discrimination Data (Example
from Dudewicz & Mishra, 1988; data from Dawson, Hankey
and Myers, 1982)
% promoted (number of employees)
Pay grade
Affected class
others
5
100 (6)
84 (80)
7
88 (8)
87 (195)
9
93 (29)
88 (335)
10
7 (102)
8 (695)
11
7 (15)
11 (185)
12
10 (10)
7 (165)
13
0 (2)
9 (81)
14
0 (1)
7 (41)
Affected class might be a minority group or e.g. women
21
Example, cont.
• Does this data indicate discrimination against the
affected class in promotions in this company?
• Let X=(X1,X2,X3) where X1 is pay grade of an
employee; X2 is an indicator of whether the
employee is in the affected class or not; X3 is an
indicator of whether the employee was promoted or
not
• x1={5,7,9,10,11,12,13,14}; x2={0,1}; x3={0,1}
22
Example, cont.
Pay grade
Affected class
others
10
7 (102)
8 (695)
• E.g., in pay grade 10 of this occupation (X1=10) there
were 102 members of the affected class and 695
members of the other classes. Seven percent of the
affected class in pay grade 10 had been promoted,
that is (102)(0.07)=7 individuals out of 102 had been
promoted.
• Out of 1950 employees, only 173 are in the affected
class; this is not atypical in such studies.
23
Example, cont.
Pay grade
Affected class
others
10
7 (102)
8 (695)
• E.g. probability of a randomly selected employee
being in pay grade 10, being in the affected class, and
promoted: P(X1=10,X2=1,X3=1)=7/1950=0.0036
(Probability function of a discrete 3 dimensional r.v.)
• E.g. probability of a randomly selected employee
being in pay grade 10 and promoted:
P(X1=10, X3=1)= (7+56)/1950=0.0323 (Note: 8% of 695 > 56) (marginal probability function of X1 and X3)
24
Example, cont.
• E.g. probability that an employee is in the other class
(X2=0) given that the employee is in pay grade 10
(X1=10) and was promoted (X3=1):
P(X2=0| X1=10, X3=1)= P(X1=10,X2=0,X3=1)/P(X1=10, X3=1)
=(56/1950)/(63/1950)=0.89 (conditional probability)
• probability that an employee is in the affected class
(X2=1) given that the employee is in pay grade 10
(X1=10) and was promoted (X3=1):
P(X2=1| X1=10, X3=1)=(7/1950)/(63/1950)=0.11
25
Describing the Population
• We’re interested in describing the population by
computing various parameters.
• For instance, we calculate the population mean
and population variance.
26
EXPECTED VALUES
Let X be a rv with pdf fX(x) and g(X) be a
function of X. Then, the expected value (or
the mean or the mathematical expectation) of
g(X)
 g  x  f X  x  , if X is discrete
 x
E  g  X    
  g  x  f X  x  dx, if X is continuous

providing the sum or the integral exists, i.e.,
<E[g(X)]<.
27
EXPECTED VALUES
• E[g(X)] is finite if E[| g(X) |] is finite.
 g  x  f X  x < , if X is discrete
 x
E  g  X     
  g  x  f X  x  dx< , if X is continuous
 
28
Population Mean (Expected Value)
• Given a discrete random variable X with
values xi, that occur with probabilities p(xi),
the population mean of X is
E(X)     x i  p( x i )
all xi
29
Population Variance
– Let X be a discrete random variable with
possible values xi that occur with
probabilities p(xi), and let E(xi) =. The
variance of X is defined by
V( X)    E( X  )    ( x i  ) p( x i )
2
2
2
Unit*Unit
all xi
The s tan dard deviation is
Unit
  2
30
EXPECTED VALUE
• The expected value or mean value of a
continuous random variable X with pdf f(x) is
  E( X ) 

xf ( x)dx
all x
• The variance of a continuous random
variable X with pdf f(x) is
 2  Var ( X )  E ( X   ) 2 

( x   ) 2 f ( x)dx
all x
 E( X 2 )   2 

all x
( x) 2 f ( x)dx   2
31
EXAMPLE
• The pmf for the number of defective items in
a lot is as follows
0.35, x  0
0.39, x  1

p ( x)  0.19, x  2
0.06, x  3

0.01, x  4
Find the expected number and the variance of
defective items.
Results: E(X)=0.99, Var(X)=0.8699
32
EXAMPLE
• Let X be a random variable. Its pdf is
f(x)=2(1-x), 0< x < 1
Find E(X) and Var(X).
33
Laws of Expected Value
• Let X be a rv and a, b, and c be constants.
Then, for any two functions g1(x) and g2(x)
whose expectations exist,
a) E  ag1  X   bg 2  X   c   aE  g1  X   bE  g 2  X   c
b) If g1  x   0 for all x, then E  g1  X   0.
c) If g1  x   g 2  x  for all x, then E  g1  x   E  g 2  x .
d ) If a  g1  x   b for all x, then a  E  g1  X   b
34
Laws of Expected Value and Variance
Let X be a rv and c be a constant.
Laws of Expected Value
 E(c) = c
 E(X + c) = E(X) + c
 E(cX) = cE(X)
Laws of
Variance
 V(c) = 0
 V(X + c) = V(X)
 V(cX) = c2V(X)
35
EXPECTED VALUE
E   ai X i    ai E  X i .
 i 1
 i 1
k
k
If X and Y are independent,
Eg  X hY   Eg  X EhY 
The covariance of X and Y is defined as
CovX, Y   EX  EX Y  EY 
 E(XY )  E(X)E (Y)
36
EXPECTED VALUE
If X and Y are independent,
Cov X ,Y   0
The reverse is usually not correct! It is only correct
under normal distribution.
If (X,Y)~Normal, then X and Y are independent
iff
Cov(X,Y)=0
37
EXPECTED VALUE
Var  X 1  X 2   Var  X 1   Var  X 2   2Cov  X 1 , X 2 
If X1 and X2 are independent,
Var  X 1  X 2   Var  X 1   Var  X 2 
38
CONDITIONAL EXPECTATION AND
VARIANCE
 yf  y x 
, if X and Y are discrete.
 y
E Y x    
  yf  y x dy , if X and Y are continuous.
 
Var Y x   E Y x   E Y x 
2
2
39
CONDITIONAL EXPECTATION AND
VARIANCE
E E Y X   E Y 
Var (Y)  EX (Var (Y | X))  VarX (E(Y | X))
(EVVE rule)
Proofs available in Casella & Berger (1990), pgs. 154 & 158
40
Example
• An insect lays a large number of eggs, each
surviving with probability p. Consider a large
number of mothers. X: number of survivors in
a litter; Y: number of eggs laid
• Assume:
X | Y ~ Binomial (Y, p)
Y |  ~ Poisson ()
 ~ Exponentia l(  )
• Find: expected number of survivors, i.e. E(X)
41
Example - solution
EX=E(E(X|Y))
=E(Yp)
=p E(Y)
=p E(E(Y|Λ))
=p E(Λ)
=pβ
42
SOME MATHEMATICAL EXPECTATIONS
• Population Mean:  = E(X)
• Population Variance:
2
2
2
  Var  X   E  X     E  X    2  0
(measure of the deviation from the population mean)
2



0
• Population Standard Deviation:
• Moments:
k

  E  X   the k-th moment
*
k
k  E  X     the k-th central moment
k
43
SKEWNESS
• Measure of lack of symmetry in the pdf.
Skewness 
E X  

3
3
3
 3/2
2
If the distribution of X is symmetric around its
mean ,
3=0  Skewness=0
44
KURTOSIS
• Measure of the peakedness of the pdf. Describes the
shape of the distribution.
Kurtosis 
E X  
4
4
4
 2
2
Kurtosis=3  Normal
Kurtosis >3  Leptokurtic
(peaked and fat tails)
Kurtosis<3  Platykurtic
(less peaked and thinner tails)
45
Measures of Central Location
• Usually, we focus our attention on two
types of measures when describing
population characteristics:
– Central location
– Variability or spread
46
Measures of Central Location
• The measure of central location reflects
the locations of all the data points.
• How?
With two data points,
the central location
But
if
the
third data
With one data point
should
fall inpoint
the middle
on the leftthem
hand-side
clearly the centralappears between
(in order
of
the
midrange,
it
should
“pull”of
location is at the point to reflect the location
the central
location
to the left.
itself.
both
of them).
47
The Arithmetic Mean
• This is the most popular measure of central
location
Sum of the observations
Mean =
Number of observations
48
The Arithmetic Mean
Sample mean
x
n
n
ii11xxii
nn
Sample size
Population mean

N
i1 x i
N
Population size
49
The Arithmetic Mean
• Example
The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33,
14, 8, 0, 9, 22 hours. Find the mean time on the Internet.
10
x01  x72  ...  x22
i 1 xi
10
x


10
10
11.0
50
The Arithmetic Mean
• Drawback of the mean:
It can be influenced by unusual
observations, because it uses all the
information in the data set.
51
The Median
• The Median of a set of observations is the value
that falls in the middle when the observations are
arranged in order of magnitude. It divides the
data in half.
Example
Comment
Find the median of the time on the internet Suppose only 9 adults were sampled
(exclude, say, the longest time (33))
for the 10 adults of previous example
Even number of observations
0, 0, 5,
0, 7,
5, 8,
7, 8,
9, 12,
14,14,
22,22,
33 33
8.59,, 12,
Odd number of observations
0, 0, 5, 7, 8 9, 12, 14, 22
52
The Mode
• The Mode of a set of observations is the value that
occurs most frequently.
• Set of data may have one mode (or modal class), or
two or more modes.
The modal class
53
The Mode
• Find the mode for the data in the Example. Here are
the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22
Solution
• All observation except “0” occur once. There are two “0”s. Thus,
the mode is zero.
• Is this a good measure of central location?
• The value “0” does not reside at the center of this set
(compare with the mean = 11.0 and the median = 8.5).
54
Relationship among Mean, Median, and Mode
• If a distribution is from a bell shaped symmetrical
one, then the mean, median and mode coincide
Mean = Median = Mode
• If a distribution is asymmetrical, and
skewed to the left or to the right, the three
measures differ.
A positively skewed distribution
(“skewed to the right”)
Mode < Median < Mean
Mode Mean
Median
55
Relationship among Mean, Median, and
Mode
• If a distribution is non symmetrical, and
skewed to the left or to the right, the three
measures differ.
A positively skewed distribution
(“skewed to the right”)
Mode
A negatively skewed distribution
(“skewed to the left”)
Mean
Median
Mean
Mean < Median < Mode
Median
Mode
56
Measures of variability
• Measures of central location fail to tell the
whole story about the distribution.
• A question of interest still remains unanswered:
How much are the observations spread out
around the mean value?
57
Measures of variability
Observe two hypothetical
data sets:
Small variability
The average value provides
a good representation of the
observations in the data set.
This data set is now
changing to...
58
Measures of Variability
Observe two hypothetical
data sets:
Small variability
The average value provides
a good representation of the
observations in the data set.
Larger variability
The same average value does not
provide as good representation of the
observations in the data set as before.
59
The Range
– The range of a set of observations is the difference
between the largest and smallest observations.
– Its major advantage is the ease with which it can be
computed.
– Its major shortcoming is its failure to provide information
on the dispersion of the observations between the two
end points.
But, how do all the observations spread out?
The range cannot assist in answering this question
? Range
? ?
Smallest
observation
Largest
observation
60
The Variance


This measure reflects the dispersion of all the
observations
The variance of a population of size N x1, x2,…,xN
whose mean is  is defined as
2 

2
N
(
x


)
i
i 1
N
The variance of a sample of n observations
x1, x2, …,xn whose mean is x is defined as
s2 
ni1( xi  x)2
n 1
61
Why not use the sum of deviations?
Consider two small populations:
9-10= -1
11-10= +1
8-10= -2
12-10= +2
A measure of dispersion
A
Can the sum of deviations
agreesofwith
this
Be aShould
good measure
dispersion?
The sum
of deviations is
observation.
zero for both populations,
8 9 10 11 12
therefore, is not a good
…but
Themeasurements
mean of both in B
measure
of
arepopulations
moredispersion.
dispersed
is 10...
4-10 = - 6
16-10 = +6
7-10 = -3
than those in A.
B
4
Sum = 0
7
10
13
16
13-10 = +3
Sum = 062
The Variance
Let us calculate the variance of the two populations
2
2
2
2
2
2 (8  10)  (9  10)  (10  10)  (11  10)  (12  10)
A 
2
5
2
2
2
2
2
2 (4  10)  (7  10)  (10  10)  (13  10)  (16  10)
B 
 18
5
Why is the variance defined as
the average squared deviation?
Why not use the sum of squared
deviations as a measure of
variation instead?
After all, the sum of squared
deviations increases in
magnitude when the variation
of a data set increases!!
63
The Variance
Let us calculate the sum
of squared
deviations
for both data sets
Which
data set has
a larger dispersion?
Data set B
is more dispersed
around the mean
A
B
1
2 3
1
3
5
64
The Variance
SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10
SumB = (1-3)2 + (5-3)2 = 8
SumA > SumB. This is inconsistent with the
observation that set B is more dispersed.
A
B
1
2 3
1
3
5
65
The Variance
However, when calculated on “per observation”
basis (variance), the data set dispersions are
properly ranked.
A2 = SumA/N = 10/5 = 2
B2 = SumB/N = 8/2 = 4
A
B
1
2 3
1
3
5
66
The Variance
• Example
– The following sample consists of the number of
jobs six students applied for: 17, 15, 23, 7, 9,
13. Find its mean and variance
• Solution
x
i61 xi
6
17  15  23  7  9  13 84


 14 jobs
6
6

n
2

(
x

x
)
1
2
i1 i
s 

(17  14)2  (15  14)2  ...(13  14)2
n 1
6 1
 33.2 jobs2

67
The Variance – Shortcut method
n
2
n


1
(

x
)
2
2
i1 i
s 
 x i 

n  1  i1
n

2



1  2
17

15

...

13
2
2

 17  15  ...  13 

6  1 
6



 33.2 jobs2
68
Standard Deviation
• The standard deviation of a set of
observations is the square root of the
variance.
Sample standard dev iation: s  s
2
Population standard dev iation:   
2
69
The Coefficient of Variation
• The coefficient of variation of a set of measurements
is the standard deviation divided by the mean value.
s
Sample coefficien t of variation : cv 
x

Population coefficien t of variation : CV 

• This coefficient provides a proportionate measure of
variation.
A standard deviation of 10 may be perceived
large when the mean value is 100, but only
moderately large when the mean value is 500
70
Percentiles
•
Example from http://www.ehow.com/how_2310404_calculate-percentiles.html
• Your test score, e.g. 70%, tells you how many
questions you answered correctly. However, it
doesn’t tell how well you did compared to the other
people who took the same test.
• If the percentile of your score is 75, then you scored
higher than 75% of other people who took the test.
71
Sample Percentiles
• Percentile
– The pth percentile of a set of measurements is the
value for which
• p percent of the observations are less than that value
• 100(1-p) percent of all the observations are greater than
that value.
72
Sample Percentiles
•Find the 10 percentile of 6 8 3 6 2 8 1
•Order the data: 1 2 3 6
6 8 8
•7*(0.10) = 0.70; round up to 1
The first observation, 1, is the 10 percentile.
73
• Commonly used percentiles
– First (lower) quartile, Q1 = 25th percentile
– Second (middle) quartile,Q2 = 50th percentile
– Third quartile, Q3 = 75th percentile
– Fourth quartile, Q4 = 100th percentile
– First (lower) decile = 10th percentile
– Ninth (upper) decile = 90th percentile
74
Quartiles and Variability
• Quartiles can provide an idea about the shape
of a histogram
Q1 Q2
Positively skewed
histogram
Q3
Q1
Q2
Q3
Negatively skewed
histogram
75
Interquartile Range
• Large value indicates a large spread of the
observations
Interquartile range = Q3 – Q1
76
Paired Data Sets and the Sample
Correlation Coefficient
• The covariance and the coefficient of
correlation are used to measure the direction
and strength of the linear relationship
between two variables.
– Covariance - is there any pattern to the way two
variables move together?
– Coefficient of correlation - how strong is the linear
relationship between two variables
77
Covariance
Population covariance  COV(X, Y) 
(x i   x )(y i   y )
N
x (y) is the population mean of the variable X (Y).
N is the population size.
(xi  x)(y i  y)
Sample cov ariance cov (x y, ) 
n-1
x (y) is the sample mean of the variable X (Y).
n is the sample size.
78
Covariance
• If the two variables move in the same
direction, (both increase or both decrease),
the covariance is a large positive number.
• If the two variables move in opposite
directions, (one increases when the other
one decreases), the covariance is a large
negative number.
• If the two variables are unrelated, the
covariance will be close to zero.
79
Covariance
• Compare the following three sets
xi
yi
(x – x)
(y – y)
(x – x)(y – y)
2
6
7
13
20
27
-3
1
2
-7
0
7
21
0
14
x=5
y =20
Cov(x,y)=17.5
xi
yi
(x – x)
(y – y)
(x – x)(y – y)
2
6
7
27
20
13
-3
1
2
7
0
-7
-21
0
-14
x=5
y =20
Cov(x,y)=-17.5
xi
yi
2
6
7
20
27
13
Cov(x,y) = -3.5
x=5 y =20
80
The coefficient of correlation
Population coefficien t of correlatio n
COV ( X, Y)

xy
Sample coefficien t of correlatio n
cov(X, Y)
r
sx sy
– This coefficient answers the question: How
strong is the association between X and Y.
81
The coefficient of correlation
+1 Strong positive linear relationship
COV(X,Y)>0
 or r =
or
0
No linear relationship
-1 Strong negative linear relationship
COV(X,Y)=0
COV(X,Y)<0
82
The Coefficient of Correlation
• If the two variables are very strongly positively
related, the coefficient value is close to +1
(strong positive linear relationship).
• If the two variables are very strongly negatively
related, the coefficient value is close to -1
(strong negative linear relationship).
• No straight line relationship is indicated by a
coefficient close to zero.
83
The Coefficient of Correlation
84
Correlation and causation
• Recognize the difference between correlation and
causation — just because two things occur together,
that does not necessarily mean that one causes the
other.
• For random processes, causation means that if A
occurs, that causes a change in the probability that B
occurs.
85
Correlation and causation
• Existence of a statistical relationship, no matter how strong it
is, does not imply a cause-and-effect relationship between X
and Y. for ex, let X be size of vocabulary, and Y be writing
speed for a group of children. There most probably be a
positive relationship but this does not imply that an increase
in vocabulary causes an increase in the speed of writing.
Other variables such as age, education etc will affect both X
and Y.
• Even if there is a causal relationship between X and Y, it might
be in the opposite direction, i.e. from Y to X. For eg, let X be
thermometer reading and let Y be actual temperature. Here Y
will affect X.
86