Download Computational Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computational Statistics
3. The Most Important Distributions
Computational Statistics - Important
Distributions
1
3.1 Discrete Transforms

The Discrete Uniform Distribution



This is the simplest discrete distribution with a
finite number of possible values, x1, x2, …, xn, with
equal probability 1/n.
Mean
x1  xn
Ex 
2
Variance
1
2
 
n
n
2


x


 i
i 1
Computational Statistics - Important
Distributions
2

Bernoulli Distribution: X ~ Bernoulli (p)

Let the probability of a trial A be P(A) = p, 0 < p <
1 and the random variable X to the indicator of A:
1, if A happens

X 
0, if A does not happen
Computational Statistics - Important
Distributions
3

The Bernoulli process must posses the next
properties:
1. The number of repeated trials is n
2. Each trial results, an outcome that is either
success or failure.
3. The probability of a success remains constant
in every trial.
4. The repeated trials are independent.
Computational Statistics - Important
Distributions
4
P X  k   p 1  p 
k
1 k
, k  0,1
EX   p
  p1  p 
2
Computational Statistics - Important
Distributions
5

Binomial Distribution: X ~ Bin (n, p)
This distribution is obtained the number of
successes n Bernoulli trial. The probability mass
function is
n k
nk
P X  k     p 1  p 
k 
Computational Statistics - Important
Distributions
6
Mean and variance of the binomial distribution are
given by
E  X   np
 2  np1  p 
Computational Statistics - Important
Distributions
7
Bin(30,0.4)
Computational Statistics - Important
Distributions
8
Bin(100,0.12)
Computational Statistics - Important
Distributions
9


This distribution is used in situations demanding
sampling with replacement.
When there are more than two possible outcomes
for each trial, then we’ll have a Multinomial
Distribution.
Computational Statistics - Important
Distributions
10

Multinomial Distribution: X ~Mu(n, p)

If a given trial can result in the k outcomes E1, E2,
…, Ek with probabilities p1, p2, p3, …, pk then the
probability distribution of the random variables X1,
X2, X3, …, Xk representing the number of
occurrences for E1, E2, …, Ek in n independent
trial is
n!
P X  k  
p1x1 p2x2  pkxk
x1! x2 ! xk !
Computational Statistics - Important
Distributions
11
where
k
x
i
i 1
k
n
p
i
1
i 1
Computational Statistics - Important
Distributions
12

Geometric Distribution: X ~ Geom(p)

In a series of Bernoulli trials let the random
variable X denote the number of trials until the first
success. Then X is a geometric random variable
with P(A) = p, 0 < p < 1.
The probability density function is
P X  k   p1  p 
k 1
Computational Statistics - Important
Distributions
13
Geom(0.5)
Computational Statistics - Important
Distributions
14

Negative Binomial Distribution:
X ~ Negbin(r, p)


This a generalization of a geometric distribution
(geom distr. will appear, when r = 1). Trials are
repeated until r successes occur. Now k = r, r+1,
r+2,…
The probability density function is
 k  1
k r r


1  p  p
P X  k   

 r  1
Computational Statistics - Important
Distributions
15
r
EX  
p
r 1  p 
 
2
p
2
Computational Statistics - Important
Distributions
16

Hypergeometric distribution:
X ~Hypergeom(n, N, r)


A set of N objects contains r objects classified as
failures. Randomly selected n items are
inspected. Let random variable X be the number
of the defectives.
The the probability density function, mean and
variance are
Computational Statistics - Important
Distributions
17
 r  N  r 
 

k  n  k 

P X  k  
N
 
n
nr
EX  
N
nr N  r N  n
2
 
N N N 1
Computational Statistics - Important
Distributions
18

Poisson distribution: X ~ Poisson (λ)


Experiments yielding numerical values of a
random variable X, the number of outcomes
occuring during an any given time interval or in a
specified region, are called Poisson experiments.
Given an interval of real numbers, assume events
occur randomly throughout the interval.
Computational Statistics - Important
Distributions
19

If the interval can be partitioned into subintervals
small enough length such that
1. the probability of more than one event in a
subinterval is zero,
2. the probability of one event in a subinterval is
the same for all subintervals and it is proportional
to the length of the subinterval, and
3. the event in each subinterval is independent of
subintervals,
Computational Statistics - Important
Distributions
20


The random experiment is called Poisson
process.
The random variable X that equals the number of
events in the interval is a Poisson random variable
with parameter λ > 0, and the probability density
function of X is
P X  k  

k
k!
e

Computational Statistics - Important
Distributions
21


Poisson distribution is a distribution with infinitely
many values. More accurately, it is possible to
prove, that Poisson distribution is obtained as a
limiting case of the binomial distribution. By letting
p -> 0 and n -> ∞ so that the mean of the binomial
approaches a finite value.
Mean and variance of a Poisson distribution are
EX   
2 
Computational Statistics - Important
Distributions
22
Poisson(10)
Computational Statistics - Important
Distributions
23
Poisson(50)
Computational Statistics - Important
Distributions
24



The difference between geometric, binomial
negative binomial distribution and Poisson is
now stressed.
Geometric: random variable denotes the
number of trials until the first success.
Binomial: random variable denotes the
number of succesful trials of all number of
trials (n is predetermined)
Computational Statistics - Important
Distributions
25


Negative Binomial: random variable denotes
the number of trials required to obtain a
certain amount of successes.
Poisson: random variable denotes succesful
events in a certain time or another interval.
Computational Statistics - Important
Distributions
26
3.2 Continuous Distributions

The Uniform Distribution:

The probability distribution function is given by
 1

,a xb
f x    b  a

 0, otherwise
and the mean and variance are
ab
EX  
2


b  a

2
2
Computational Statistics - Important
Distributions
12
27
Computational Statistics - Important
Distributions
28

The Exponential Distribution: X ~Exp(λ)



Suppose a random variable has a Poisson
distribution. The the time between the events
occuring is a random variable which follows an
exponential distribution.
Exponential distribution is a special case of the
gamma distribution(will follow).
The exponential probability density function is
given by
 e t , t  0
f t   
0, otherwise
Computational Statistics - Important
Distributions
29

The mean and variance ar given by
EX  
1

 
1
2

2
where α > 0.
Computational Statistics - Important
Distributions
30

The Gamma Distribution: X ~Gamma(α, β)

First we define the gamma function Γ(α) for all x >
0, when α > 0:

   x

 1  x
e dx
0
Computational Statistics - Important
Distributions
31


It can be shown that,
Γ(α) = (α - 1) Γ(α - 1),
Γ(1) = 0! =1
and if α is positive integer then
Γ(α) = (α - 1)!.
The random variable X has a gamma distribution
with parameters α, β >0, if its probability density
function for x > 0 is given by
Computational Statistics - Important
Distributions
32
1
f x   
x 1e  x / 
  
Computational Statistics - Important
Distributions
33

The Normal distribution: X ~ N(μ, σ2)

A random variable X with probability density
function
f x  
1
 2
e
1  x 
 

2  
2
,  x 
with parameters - ∞ < μ <∞ and σ > 0
is a normal random variable.
Computational Statistics - Important
Distributions
34


When E(X) = μ = 0 and σ2 = 1, a normal random
variable is called a standard normal varible and
the distribution respectively a standard normal
distribution.
Some properties of the normal distribution:
1. The mode, which is the point on the horizontal
axis where the curve is a maximum occurs at x =
μ.
Computational Statistics - Important
Distributions
35
2. The curve is symmetric through the mean μ.
3. The total area under the curve and above the xaxis is equal to 1.
Computational Statistics - Important
Distributions
36
Normal(10,3.25)
Computational Statistics - Important
Distributions
37

Recall, that the cumulative distribution function
(cdf), F(x), of a continuous random variable X with
density function f(x) is
x
F ( x)  P( X  x) 
 f t dt


In the case of the normal distribution these values
have to be calculated numerically.
Computational Statistics - Important
Distributions
38



That’s why some of the these values are given in
a tabular form.
Secondly, it is impossible to calculate all the
possible values according to all the worlds’ normal
distributions-> standardized normal values are
given in a tabular form.
Other normal distributions are transformed to
standard values by
Z
X 

Computational Statistics - Important
Distributions
39
Standard normal distribution
Computational Statistics - Important
Distributions
40
Related documents