Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Introduction to
Biostatistics (ZJU 2008)
Wenjiang Fu, Ph.D
Associate Professor
Division of Biostatistics, Department of
Epidemiology
Michigan State University
East Lansing, Michigan 48824, USA
Email: [email protected]
www: http://www.msu.edu/~fuw
Chapters 4-5 Probability distribution





Random Variable (rv)
Definition 1. A random variable (r.v.) is a numerical quantity that
takes different values with specified probability.
Definition 2. A discrete r.v. is a r.v. for which there exists a
discrete set of values with specified probability.
Examples: discrete r.v. number of episodes of a
disease/symptoms, heart attacks, diarrhea, blood cell counts.
Definition 3. A continuous r.v. is a r.v. whose values form a
continuum, and the range of values occur with specified
probability.
Examples: height, weight, temperature, FEV, blood pressure, etc.
Probability distribution

Definition 4. A probability mass function (PDF)
is a mathematical relationship/rule that assigns
probability to each possible outcome.
Pr (X = r) – Prob. Distribution

Example: Hepatitis A. Household with 4 people

r: # people contracted H.A
r
|
0
1
Pr (X = r)
|
.3
.1
2
.1
3
.3
4
.2
Probability distribution
Properties of Probability
1. 0  Pr (X = r)  1
2. Total probability ∑ r Pr(X=r) = 1

Expected value of a discrete r.v.
E( X )   

k
xi Pr( X  xi )

i 1
where xi's are the values X assumes with positive probability.
Example   k x Pr( X  x )

i 1
i
i
= 0 x .3 + 1x .1 + 2 x .1 + 3 x .3 + 4 x .2 = 2.0
Interpretation: On the average, a household with 4 people has
2 people contracted Hepatitis A ; or two people are expected to
be contracted H.A in a household of 4.
Probability distribution

Variance of a discrete r.v. X
k


var (X) =  2 =  ( xi   )2 Pr( X  xi )
i 1
where xi are the values that X takes with positive
probabilities.
Variance is the expected value of (X-)2, or
Var (X) = E [(X-)2]
the expected value of the squared distance from the
mean.
A short version of var (X) is given by
k
var( X ) [ xi2 Pr( X  xi )]   2
i 1
SD   
var( X )
Probability distribution
Example:
 2 = 22x.3+12x.1+02x.1+12x.3+22x.2 = 2.4
or  2 = 02x.3+12x.1+22x.1+32x.3+42x.2 - 22 = 2.4
SD (X) =  = 1.55

A Rule that is true for most cases
Approximately 95% of the probability mass falls within
two SD of the mean (expected value) of a r.v.
  2 = 2.0  2 x 1.55 = 2.0  3.1 = [-1.1, 5.1]
Probability distribution

Cumulative distribution function (CDF) of a discrete r.v.
F ( x ) = Pr ( X  x )
Example
F(0) = Pr (X  0) = Pr (X = 0) = .3
F(1.8) = Pr (X  1.8) = Pr (X = 0) = .3
F(1) = Pr (X  1) = Pr (X < 1) + Pr (X = 1) = .3 + .1 = .4
F(2) = Pr (X  2) = Pr (X < 2) + Pr (X = 2) = .4 + .1 = .5
F(3) = Pr (X  3) = Pr (X < 3) + Pr (X = 3) = .5 + .3 = .8
F(4) = Pr (X  4) = Pr (X < 4) + Pr (X = 4) = .8 + .2 = 1
Probability distribution


Properties of mean and variance of distribution
Let X, Y and Z are random variables.
If Y = aX +b with constants a and b,
then E(Y) = aE(X) + b
Var (Y) = a2 Var(X)
SD (Y) = |a| SD (X)
If Z = aX+bY,
then E(Z) = a E(X) + b E(Y).
Binomial distribution


Permutation : The number of permutations of n
things taken k at a time is
nPk = n (n-1) (n-2) … (n-k+1)
It represents the number of ways of selecting k items
out of n, where the order of selection is important.
8P5 = 8 x 7 x 6 x 5 x 4 = 6720
nPn = n (n-1) (n-2) … x 2 x 1 = n!
n! = n factorial = n x (n-1) x … x 2 x 1
Definition 0! = 1.
Binomial distribution

Combination: the number of combinations of
n taken k at a time is
 n  n(n 1)...(n  k 1)
n
!

n Ck    
k!
(n  k )!k !
k 
The number of ways to choose 5 doctors from 8
doctors is
8
8!
8

7

6

5

4

 56
8 C5    
 5  (8  5)!5! 5 4 3 21
Binomial distribution




The Binomial Distribution
n independent trials, each trial has two outcomes:
success (1) or failure (0) with constant probability
Pr (success) = p, and Pr (failure) = 1 – p = q
Example. Flu infection. 5 indep individuals were
together.
Prob (being contracted with flu) = .6
Prob (2 out of 5 contract flu) = ?
1st question: does order matter? Who becomes 1st case,
who 2nd.
No. So use combination.
Binomial distribution




Each possibility: Fi = {i-th subj. got flu}
Pr ( F1F2F3F4F5) = pxpxqxqxq = p2 (1-p)3 =
=.62x(1-.6)3 = .02304
Pr (2 out of 5 flu) = 5C2 p2(1-p)3 =10 x.02304
= .2304
The combination number represents the total
number of different events:
F1F2F3F4F5, F1F2F3F4F5, F1F2F3F4F5, etc..
Binomial distribution

Binomial distribution B(n, p)
The distribution of the number of successes in n statistically
indep trials with the prob. of success on each trial p is a binomial
distribution and has a probability mass function

Pr (X = k) = nCk pk(1-p) n-k ,

Example. The ratio of # boys to girls is 2:3 in one classroom.
Pr (having 3 boys out of 5 children) = ?
p = # boys / # children = 2 / (2+3) = .4
Pr (3 boys out of 5) = 5C3 p3(1-p)5-3 =10x .43(1-.4)5-3 =.2304
Pr (having at least 3 boys out of 5 children) = ?
Pr (3 boys out of 5) + Pr (4 boys out of 5) + Pr (5 boys out of 5)
= .2304 + 5x.0256x.6 + 1x.01024x1 = .3174
k = 0, 1, …, n.
Binomial distribution



Binomial distribution B(n, p)
Binomial Table
n = 2, 3, …, 20,
p = .05,
01, .15, …, .50.
Recursion rule --- simplify the calculation of binomial prob. in
the old days.
Pr (X=k+1) = (n-k) / (k+1) * p/q * Pr (X=k),
k=0,1,2,…
If Pr (X=0) is known, then Pr (X=1) is known, then
Pr (X=2), …,

New Approach: computer programs: Splus, R, SAS, etc.
In R: dbinom(x, n, p), pbinom(x, n, p), qbinom(x, n, p),
Binomial distribution

Expected value and variance of binomial dist B(n, p)
n k
E ( X )   xi Pr( X  xi ) 
k   p (1 p)nk  np
i 1
k 0  k 
n


n

Var (X) = np(1-p)
Some important points of B(n, p)
1. mean and variance depend on n and p.
2 The larger the number of trials, the larger the mean and
variance.
3 The larger the probability of success p, the larger the mean.
4 var (X) is small for very small p close to 0 and very large p close
to 1. It attains the maximum value at p = .5 with var (X) = n/4.
p = .5 : success and failure are equally likely to occur. Toss a coin.
Binomial distribution






Example Disease asthma caused by pollution from nearby
industry
p = Pr(having asthma nationwide) = .03 (reference level).
In a small community of n=100, we observe 8 cases. Is this an
alarming evidence of asthma?
Pr (having asthma in community) = 8/100 = .08 > reference
level
Is this much higher? Usually or unusually high? With a small
probability?
Is {having 8 cases} a small probability event?
If {8 cases} is unusually high, then {9 cases}, {10 cases}… all
high.
Criterion to use
Is {having at least 8 cases} a small probability event?
Binomial distribution


So need Pr (at least 8 cases) = Pr (X  8)
Need to calculate 100 –8 +1 = 93 probabilities.
Pr (at least 8 cases) = 1- Pr (at most 7 cases) =
= 1 – ∑k=07Pr(X=k) = 0.028,
A small probability event (< 0.05) based on reference.
Claim that having such a small probability event is very
unlikely.
Interpretation: However, since we have observed such a
small probability event, we believe this is an unusual
observation, or having observed 8 cases out of 100 in
the community is alarmingly high.
Poisson distribution

Three assumptions for Poisson distribution
1. The event is rare, i.e. Pr (observing 1 event instantly ) ≈ λΔt;
Pr (observe > 1 events instantly ) ≈ 0  Pr (0 event) ≈ 1- λΔt.
2. Stationary process. The number of events per unit time
remains the same during the entire duration of time.
3. Independence. The outcome in one time interval does not
affect the probability of another time interval of no time overlap.

Poisson distribution
Pr (X=k) = e–μ μk / k!, k = 0, 1, 2, …
Where μ = λt, t is the time duration or area, λ is the intensity per
unit.
Notice that k has no upper bound or ceiling.

Poisson distribution





Mean and variance of Poisson distribution
E(X) = ∑ k k Pr (X=k) = ∑ k k e–μ μk / k! = μ
Var (X) = ∑ k k2 Pr (X=k) – [E(X)]2 = μ
That’s why Poisson distribution has only one parameter μ.
Example.
Assume 3 traffic accidents are expected in the city of Detroit every
day in the winter (Nov – Feb), while only 1 accident is expected
per day other time of the year due to the weather conditions. If 5
accidents were observed on one day, was this an alarming event?
Distribution? No cap or upper bound of # events (n). So use
Poisson distribution.
Warning: only use this distribution in the time period when the
intensity λ remains constant; should not use for whole year! Why?
Poisson distribution
Notice the 2 different intensities, winter (λ=3) and other
(λ=1).
 If the day was in winter, λ=3
Pr(k≥5) = 1- [Pr(k=0)+ …+ Pr(k=4)]= 1-0.815 =
0.185
If the day was in summer, λ=1
Pr(k≥5) = 1- [Pr(k=0)+ …+ Pr(k=4)]= 1-0.996 =0.004,
a small probability event.
 Conclusion, if 5 accidents were observed in the
summer, it was alarmingly high, but not in the winter.
 How to work on the prob of observing 20 in 2
consecutive months (Feb & March) together?

Gaussian (Normal) distribution

Continuous rv X ~Gaussian distrib. N( μ, σ2)
1
(x  )
f ( x) 
exp{
}
Prob density function (PDF)
2

2
for some parameters ,  with  > 0.
2 parameters ,  2 determine the distribution,
the mean  for location and variance  2 for shape.
2
2



X may take any real number, either > 0 or < 0.
f is symmetric about .
F: the cumulative distribution function CDF
F ( x)  
x

f (t )dt  
x

1
(t   )2
exp{
}dt
2
2
2
Standard normal distribution

X ~standard normal distrib. N( 0, 1)
1
x
f
(
x
)

exp{

}
Prob density function (PDF)
2
2
X may take any real number, either > 0 or < 0.
f is symmetric about 0.
Φ: the cumulative distribution function CDF
1
t
( x)   f (t )dt  
exp{ }dt
2
2
A very useful function in statistics.
Φ (-x) = 1 – Φ (x), frequently used for the
calculation of p-value.
2


x
x


2
Properties of N (0, 1)



Properties of standard normal N (0, 1)
PDF f (x), –  < x < 
1). symmetric about 0: f (x) = f (– x)
2). Pr (-1  X  1) = .6827, or
about 68% (more than 2/3) of the area lies in [–1, 1].
Pr (-1.96  X  1.96) = .95, or
about 95% area lies between –1.96 and 1.96.
Pr (-2.576  X  2.576) = .99 , or
about 99% area lies between –2.5 and 2.5
Illustration of N (μ, σ2)
68
68%
%
95.4%
95%
99.7%
99.7%
Special notation for N(0,1)


Definition The 100 x uth percentile of N (0,1) is
denoted by
Zu : Pr ( X  Zu ) = u, where X  N (0,1) then
 (Zu) = Pr ( X  Zu ) = u
Frequently used quantiles Z.975 , Z.95 , Z.5 , Z.05, Z.025
 (1.96) = .975,  (1.645) = .95,  (0) = .5
 (-1.645) = 1 -  (1.645) = 1 - .95 = .05
 (-1.96) = 1 -  (1.96) = 1 - .975 = .025
Z.975 = 1.96. Z.95 = 1.645
Z.5 = 0
Z.05 = -1.645
Z.025 = -1.96
Calculation of probability




X ~ N( μ, σ2). Calculate Pr (a < X< b)
Z = (X- μ) / σ, then Z ~ N(0,1), use .
Pr (a < X< b)
= Pr {[(a-μ)/σ] < [(X-μ)/σ] < [(b-μ)/σ]}
= Pr {[(a-μ)/σ] < Z < [(b-μ)/σ]}
=  [(b-μ)/σ] -  [(a-μ)/σ]
Example: Hypertension. SBPX  N (80, 144 )
Pr (90 < X < 95)=Pr[(90 – 80)/12 <X< (95-80)/12]
= Pr (.83 < Z < 1.25 ) = ( 1.25 ) - ( .83 )
= .8944 - .7967 = .098
Calculation of probability




Example: Cerebrovascular disease.
To Diag. stroke, use cerebral blood flow (CBF),
clinically diagnose patient at risk: CBF < 40.
Assume normal people's CBF has normal distribution with mean
75 and SD = 17. Find percentage of normal people mistakenly
classified by CBF as stroke patients.
Let X be CBF in normal person. Then X  N (75, 172)
Need to find Pr (X < 40).
Pr (X < 40) = Pr (Z < (40 – 75)/17 ) = Pr (Z < – 2.06)
= (– 2.06) = 1 – (2.06) = 1 – .9803 = .02 = 2 %
Approximation of distribution




Bin (n, p) can be difficult to calculate,
Pois (μ) can also be difficult to calculate.
Easy to calculate N (0,1).
Need: use Normal distribution to approximate
others so that the computation will be much
easier, and yet the probability accuracy will not
be compromised a lot.
Pois (μ) Approximation to Bin (n, p)




Basic judgment: two distributions must be close
to each other enough: their parameters close:
μ1 and μ2 are close, and σ12 and σ22 are close.
Bin (n, p) mean np, var npq
Pois (μ) mean μ, var μ.
So np ≈ npq and moderate.
When n large, p small (q = 1-p close to 1) and
np is moderate, can approximate X ~ Bin (n,p)
with Y ~ Pois(np).
Normal Approximation to Bin (n, p)




Bin (n, p) mean np, var npq
N (μ, σ2) mean μ, var σ2.
Bin (n,p) needs to be roughly symmetric.
When npq ≥ 5, can approximate X ~ Bin (n,p) with Y
~ N(np, npq).
Pr(X=k) = Pr(k-.5 <Y<k+.5);
Pr(X≥k) = Pr( Y > k-.5)
Pr(X≤k) = Pr( Y < k+.5)

If use normal approximation, n ≥ 20. Why?

If p is so small that npq < 5, then do not use normal
approximation, but rather use Poisson approximation.
Normal Approximation to Pois (μ)




Pois (μ) mean μ, var μ
N (μ, σ2) mean μ, var σ2.
Pois (μ) needs to be roughly symmetric.
When μ ≥ 10, can approximate X ~ Pois (μ)
with Y ~ N(μ, μ).
Pr(X=k) = Pr(k-.5 <Y<k+.5);
Pr(X≥k) = Pr( Y > k-.5)
Pr(X≤k) = Pr( Y < k+.5)