Download Basic Probability and Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Basic Probability and
Statistics
Random variables
Distribution functions
Various probability distributions
Definitions
• An experiment is a process whose output is not known
with certainty.
• The set of all possible outcomes of an experiment is called
the sample space (S).
• The outcomes are called sample points in S.
• A random variable is a function that assigns a real number
to each point in S.
• A distribution function F(x) of the random variable X is
defined for each real number x as follows
F ( x)  Pr( X  x)
2
Properties of distribution function
0  F ( x)  1  x.
F(x) is non - decreasing . For x1  x2 : F ( x1 )  F ( x2 ).
lim F ( x)  1;
x 
lim F ( x)  0.
x  
3
Random Variables
• A random variable (r.v.) X is discrete if it can take on at
most a countable number of values x1, x2, x3,…
• The probability that the discrete r.v. X takes on a value xi
is given by: p(xi)=Pr(X= xi).
• p(x) is called the probability mass function.

 p ( x )  1.
i 1
i
F ( x)   p( xi ).
xi  x
4
Random Variables
• A r.v. is said to be continuous if there exists a nonnegative
function f(x) such that for any set of real numbers B,
Pr( X  B)   f ( x)dx and
B

 f ( x)dx  1.

• f(x) is called probability density function.
F ( x)  Pr( X  x)  Pr  X  (, x)  
x
 f ( y)dy.

5
Random Variables
• Mean or expected value of a r.v. X is denoted by E[X] or
µ, and given by:

 x j p ( x j )
 j 1



 xf ( x)dx

if X is discrete,
if X is continuous .
• Variance of a r.v. X is denoted by Var(X) or σ2, and given
by:


 2  E  X   2  E X 2   2 .
6
Properties of mean
• If X is a discrete random variable having pmf p(x), then:
Eg ( x)   g ( x) p( x).
x
• If X is continuous with pdf f(x), then:
E g ( x) 

 g ( x) f ( x)dx.

• Hence, for constants a and b,
EaX  b  aEX   b.
7
Property of variance
• For constants a and b,
Var aX  b  a 2Var  X .
8
Joint Distribution
•
If X and Y are discrete r.v., then,
p( x, y )  Pr( X  x, Y  y )  x, y
is called the joint probability mass function of X and Y.
• Marginal probability mass functions of X and Y:
p X ( x )   p ( x, y )
Y
pY ( y )   p( x, y )
X
•
X, Y are independent if p( x, y)  pX ( x) pY ( y)
9
Conditional probability
• Let A and B be two events.
• Pr(A|B) is the conditional probability of event A happening
given that B has already occurred.
• Baye’s theorem:
Pr  A | B  
Pr  A  B 
.
Pr B 
• If events A and B are independent, then Pr(A|B) = Pr(A).
• Hence, from Baye’s theorem:
PrA  B  PrAB  PrAPrB.
10
Dependency
• Covariance is a measure of linear dependence and is
denoted by Cij or Cov(Xi, Xj)

 

Cij  E  X i  i X j   j   E X i X j  i  j , i  1..n & j  1..n
• Another measure of linear dependency is the correlation
factor:
Cij
ij 
, i  1..n & j  1..n.
 i2 2j
• Correlation factor is dimensionless but covariance is not.
11
Two random numbers in simulation
experiment
• Let X and Y be two random variates in a given simulation
experiment that are not independent.
• Our performance parameter is X+Y.
E X  Y   E X   E Y .
Var  X  Y   Var  X   Var Y   2Cov X , Y .
• However, if the two r.v.’s are independent:
Cov X , Y   0.
Var  X  Y   Var  X   Var Y .
12
Bernoulli trial
• An experiment with only two outcomes – “Success” and
“Failure” where the chance of outcome is known apriori.
• Denoted by the chance of success “p” (this is a parameter for
the distribution).
• Example: Tossing a “fair” coin.
• Let us define a variable Xi such that –
1
Xi  
0
if trial i is a success
otherwise.
• Then, E[Xi] = p; and Var(Xi) = p(1-p).
13
Binomial r.v.
• A series of n independent Bernoulli trials.
• If X is the number of successes that occur in the n trials, then
X is said to be Binomial r.v. with parameters (n, p). Its
probability mass function is:
n x
Px  PrX  x    p (1  p) n  x ,
 x
where
x  0,1,2,...n
n
n!
  
.
 x  x!(n  x)!
14
Binomial r.v.
n
X   Xi,
i 1
1
Xi  
0
if trial i is a success
otherwise.
n
E[ X ]   E X i   np,
i 1
n
Var ( X )   Var  X i 
i 1
 np (1  p).
15
Poisson r.v.
• A r.v. X which can take values 0, 1, 2, … is said to have a
Poisson distribution with parameter λ (λ > 0) if the pmf is
given by:
pi  PrX  i  e 
i
i!
, i  0,1,2,...
• For a Poisson r.v., EX   Var X   .
• The probabilities can be recursively found out:
pi 1 

i 1
pi , i  0.
16
Uniform r.v.
• A r.v. X is said to be uniformly distributed over the
interval (a, b) when its pdf is:
 1

f ( x)   b  a
0
if a  x  b
otherwise.
• Expected value:
1
b2  a 2 a  b
E X  
xdx 

.

ba a
2(b  a )
2
b
 
E X2
3
3
2
2
1
b

a
a

b
 ab
2

x dx 

.

ba a
3(b  a )
3
b
17
Uniform r.v.
• Variance
 
Var  X   E X  EX 
2
2
(b  a) 2

.
12
• Distribution function F(x) for a given x: a < x < b is
1
xa
F ( x)  PrX  x  
dy 
.
ba
ba
a
x
18
Normal r.v.
pdf:
1
( x   ) 2 / 2 2
f ( x) 
e
,  x  .
2 
The normal density is a bell-shaped curve that is symmetric
about µ.
It can be shown that for a normal r.v. X with parameters (µ,
σ2),
EX    ,
Var X    2 .
19
Normal r.v.
• If X ~ N(µ, σ2), then Z  X   is N(0,1).

• Probability distribution function of “Standard Normal” is
given as:
1 x  y2 / 2
( x) 
e
dy,    x  .


2
• If X ~ N(µ, σ2), then:
 x 
F ( x )  
.
  
20
Central Limit Theorem
• Let X1, X2, X3…Xn be a sequence of IID random variables
having a finite mean µ and finite variance σ2. Then:
 X  X 2   X n  n

lim Pr  1
 x   ( x).
n 
 n


21
Exponential r.v.
pdf:
f ( x)  e x , 0  x  .
cdf:
x
x
0
0
F ( x)   f ( y )dy   e y dy  1  e x .
EX  
1

;
Var  X  
1

2
.
22
Exponential r.v.
• When multiplied by a constant, it still remains an exponential
r.v.
x


 x / c
PrcX  x  Pr  X    1  e
.  cX ~ Expo .
c

c
• Most useful property: Memoryless!!!
PrX  s  t | X  t  PrX  s t, s  0.
• Analytical simplicity
X 1 ~ Exp(1 ), X 2 ~ Exp(2 )
P( X 1  X 2 ) 
1
1  2
.
23
Poisson process
A counting process {N (t ), t  0} is said to be a
Poisson process if :
 N ( 0)  0 .
 The process has independen t increments .
 The number of events in any interval of length t is Poisson
distribute d with mean . That is, s, t  0
PrN (t  s )  N ( s )  n  e t
( t ) n
, n  0,1,2 
n!
If Tn , n  1 is the time between (n  1) st and nth event,
then this interarriv al time has exponentia l distributi on.
24
Useful property of Poisson process
• Let S11 denote the time of the first event of the first Poisson
process (with rate λ1), and S12 denote the time of the first
event of the second Poisson process (with rate λ2). Then:
P ( S11  S12 ) 
1
1  2
25
Covariance stationary processes
• Covariance between two observations Xi and Xi+j depends
only on j and not on i.
• Let Cj be the covariance for this process.
• So the correlation factor is given by:
j 
Ci ,i  j
 
2
i
2
i j

Cj

2
,
j  1,2,.
26
Point Estimation
• Let X1, X2, X3…Xn be a sequence of IID random variables
(observations) having a finite population mean µ and finite
population variance σ2.
• We are interested in finding these population parameters
through nthe sample values.
Xn 
X
i 1
i
n
• This sample mean is unbiased point estimator of µ.
• That is to say that: E X n   .
 
27
Point Estimation
 X
n
• The sample variance:
S 2 ( n) 
i 1
i  Xn
2
n 1
is an unbiased point estimator of σ2.
• Variance of the mean: Var X n  
2
n
.
S 2 ( n)
.
• We can estimate this variance of mean by: Var X n  
n
• This is true only if X1, X2, X3…Xn are IID.
28
Point Estimation
• However, most often in simulation experiment, the data is
correlated.
• In that case, estimation using sample variance is dangerous.
Because it underestimates the actual population variance.


E S 2 (n)   2 , and
 S 2 ( n) 
E
  Var X n .
 n 
29
Interval Estimation
• Let X1, X2, X3…Xn be a sequence of IID random variables
(observations) having a finite population mean µ and finite
population variance σ2(> 0).
• We want to construct confidence interval for mean µ.
• Let Zn be a random variable with a probability distribution
Fn(z).
 X n   
Zn 
.
2
 /n
Fn ( z )  Pr Z n  z .
30
Interval Estimation
• Central Limit Theorem states that:
Fn ( z )  ( z ) as n  .
where is the standard normal distribution with mean 0 and
variance 1.
• Often, we don’t know the population variance σ2.
• It can be shown that CLT applies if we replace σ2 by sample
variance S2(n).
tn 
X
n
 
S 2 ( n) / n
• The variable tn is approximately normal as n increases.
31
Standard Normal distribution
• Standard Normal distribution is N(0,1).
• The cumulative distributive function (CDF) at any given
value (z) can be found using standard statistical tables.
• Conversely, if we know the probability, we can compute the
corresponding value of z such that,



F ( z1 )  Pr Z  z1  1  .
2
• This value is z1-α/2 and is called the critical point for N(0,1).
• Similarly, the other critical point (z2 = -z1-α/2) is such that:

F ( z 2 )  PrZ  z 2   .
2
32
Interval Estimation
• It follows for a large n:

Pr  z1  Z n  z1
2

 Pr  z1 
2

2


 z1 
2
2
S ( n) / n

Xn  

S 2 ( n)
S 2 (n) 
 Pr  X n  z1
   X n  z1

2
2
n
n


 1.
33
Interval Estimation
• Therefore, if n is sufficiently large, an approximate
100(1- α) percent confidence interval of µ is given by:
X n  z1
2
S 2 ( n)
.
n
• If we construct a large number of independent 100(1- α)
percent confidence intervals each based on n different
observations (n sufficiently large), the proportion of
these confidence intervals that contain µ should be 1- α.
34
Interval Estimation
• What if the n is not “sufficiently large”?
• If Xi’s are normal random variables, the random variable tn
has a t-distribution with n-1 degrees of freedom.
• In this case, the 100(1-α) percent confidence interval for µ is
given by:
X n  tn 1,1
2
S 2 ( n)
.
n
35
Interval Estimation
• In practice, the distribution of Xi’s is rarely normal and the
confidence interval (with t-distribution) will be approximate.
• Also, tn1,1 2  z1 2 , the CI given with “t” is larger than
the one with “z”.
• Hence, it is recommended that we use the CI with “t”. Why?
• However,
tn1,1  z1
2
as n  .
2
36
Interval Estimation
• The confidence level has a long-run relative frequency
interpretation.
• The unknown population mean µ is a fixed number.
• A confidence interval constructed from any particular
sample either does or does not contain µ.
• However, if we repeatedly select random samples of that
size and each time constructed a confidence interval, with
say 95% confidence, then in the long run, 95% of the CI’s
would contain µ.
• This happens because 95% of the time the sample mean
Y would fall within 1.96 Y of .
• So 95% of the times, the inference about µ is correct.
37
Interval Estimation
• Every time we take a new sample of the same size, the
confidence interval is going to little different than the
previous one.
• This is because the sample mean Y varies from sample to
sample.
• In practice, however, we select just one sample of fixed size n
and construct one confidence interval using the observations
in that sample.
• We do not know whether any particular CI truly contains μ.
• Our 95% confidence in that interval is based on long-term
properties of the procedure.
38
Hypotheses testing
• Assume that X1, X2, X3…Xn are normally distributed (or be
approximately normal) and that we would like to test
whether µ = µ0, where µ0 is a fixed hypothesized value of µ.
• If X n   0 is large then our hypothesis is not true.
• To conduct such test (whether the hypothesis is true or not),
we need a statistical parameter whose distribution is known
when the hypothesis is true.
• Turns out, if our hypothesis is true (µ = µ0), then the statistic
tn has a t-distribution with n-1 df.
39
Hypotheses testing
• We form our two-tailed hypothesis (H0) to test for µ = µ0 as:
If
tn
 t n 1,1

2

 t n 1,1 2
Reject H 0
``Accept' ' H 0
• The portion of real line that corresponds to the rejection of
H0 is called the critical region for the test.
• The probability that the statistic tn falls in the critical region
given that H0 is true, which is clearly equal to α, is called
level of the test.
• Typically if the tn doesn’t fall in the rejection region, we “do
not reject” the H0.
40
Hypotheses testing
• Type I error: If one rejects H0 when it is true, this is called
Type I error, which is again equal to α. This errors is under
experimenter's control.
• Type II error: If one accepts H0 when it is false, it is Type II
error. It is denoted by β.
• We call δ = 1 - β as power of test which is the probability of
rejecting H0 when it is false.
• For a fixed α, power of the test can only be increased by
increasing n.
41