Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Discrete distributions and their
use
Poisson and binomial distributions
Discrete distribution
• They can attain only selected discrete
values
• The most common are those, that can gain
just integer values. (usually 0, 1, 2, etc
• The most common are Poisson and
Binomial distribution.
• Others are negative binomial and Neymann
(A and B) one (called “contagious” d.)
Poisson distribution
Poisson distribution
Model: I have a many cups and I throw a taw many times. In
every throw I hit inside some cup. Every throw is independent
of the previous ones (that means, that presence of taw in a cup
does not influence if I hit inside it again). Number of taws in
cup will then have Poisson distribution. Its only parameter, λ, is
ratio of all taws to number of cups (and so, mean number of
taws in cup).
Poisson distribution
Poisson distribution

e 
P( x  X ) 
X!
X
Mean value and variance of distribution = λ
Distribution is always positively skew, nevertheless the
higher λ, the smaller skew. If λ is large (say>12), the
distribution is close to normal.
Where to use it
• Poisson variable as dependent (in ANOVA,
regression):
• I either use Generalize Linear Models (one
of their possibilities is Poisson distribution)
• Or I use the square root transformation prior
to ANOVA
3
X   X _ or _ X   X 
8
• It should make the distribution more symmetrical and
stabilize the variance at the same time (i.e. improving
normality and homoscedasticity)
Evaluations of randomness
in continuous space (e.g. individuals of a plant species in
meadow)
Fig. Types of distribution in continuous space: cluster (A), random (B), regular - uniform (C), totally regular in
square net (D).
or in discrete units (for parasites on a carp or recombinant nodi on
a chromosome)
If individuals are distributed randomly
Numbers of individuals in
randomly placed squares have
Poisson distribution - so
X s
2
Occurrence of an individual in a
unit, does not change the
probability of occurrence of
another individual in this unit
If individuals are distributed in clusters
Numbers of individuals in
randomly placed squares
have
X s
2
Occurrence of an individual
increases probability of finding
another individual in the same unit
Boletuses grow in clusters and so, if I find one, I start looking for
its neighbours
Obr. Typy rozmístění individuí ve spojitém prostoru: shlukovité (A
If there is tendency to regularity
or total regularity
X s
2
Occurrence of an
individual decreases
probability of
finding another
individual in the
same unit
hlukovité (A), náhodné (B), pravidelné - rovnoměrné (C),
2
Ratio
s
X
(for number of individuals)
is then considered to be a characteristic of distribution of individuals
Popular is also Lloyd index – it doesn’t change if
individuals randomly die out
2
s
1
L X
X
Deviation from randomness can
be tested:
1. Classic test goodness of fit for Poisson distribution
2
2.
s
( n  1)
X
has approximately 2 distribution with n-1 degrees of freedom
for Poisson distribution.
Many methods describing distribution
of objects in continuous space
• So called Spatial pattern analysis; popular
is K-function
• Size of clusters
• Intensity of clustering
• Change of spatial pattern in time
• Spatial relation of objects of different
categories (e.g. alive and death trees, two
species of trees etc.)
Distribution of
population
Corresponding
distribution of
number of individuals
in experimental unit
Ratio of variance and
mean (2/) of
distribution
Relationship of
presence of
individuals
The commonest
ecological reasons for
differences from
randomness
regular (uniform)
e.g. binomial
<1
Occurence of one
individual in the unit
decreases probability
of occurance of
another individual
Intraspecific
competition, teritorial
behaviour
random
Poisson
1
Occurence of
individuals is
independent on each
other
clustered
contagious (e.g.
negative binomial,
Neyman)
>1
Occurence of one
individual in the unit
increases probability of
occurance of another
individual
Type of reproduction,
heterogenity of
environment
Binomial distribution
Number of successes from number of independent attempts,
where every attempt has the same probability of success
Model: I throw to every cup n-times (e.g. 5-times), in every
attempt I have probability p, to hit inside. Number of hits per
cup has binomial distribution then. There are two parameters - n
– number of attempts, and p – probability in every attempt. q –
probability of failure isn’t the other parameter, as q =1-p
Binomial distribution
3
2
Holds
n!
X n X
P( x  X ) 
p q
X !(n  X )!
 x  np
  npq
2
x
Distribution is symmetric (and by given n nearest to normal
one for p=q=1
distribution is mounting to normal one with growing n
Most common use – estimation
of parameter p
• Estimate of percentage of worm-eaten apples (in
whole orchard)
• Estimates of cover of a species (point quadrate)
• Estimate of propotion (percentage) of people
having antibodies against borreliosis
• How much (percents) people will vote for Political
party of slight progress in bounds of law
We have n experiments
• 100 (randomly chosen) investigated apples
• 50 (randomly chosen) investigated people
for borreliosis
• 800 (randomly chosen) respondents in
survey
X
pˆ 
n
pq
 
n
2
pˆ
pˆ qˆ
s 
n 1
2
pˆ
from 100 apples were 15 worm-eaten, we
estimate p 0.15 (15%)
But we don’t know p , we know just
estimation.
By normal approximation is
(1-) confidence interval
1 

pˆ   Z (1 / 2 ) spˆ 

2n 

Z(1 - /2) is (1-/2)*100percentage qantile of normalized
normal distribution
p̂
n
0.5
 30
0.4 nebo 0.6
 50
0.3 nebo 0.7
 80
0.2 nebo 0.8
 200
0.1 nebo 0.9
 600
Table A Pertinence of normal
approximation for binomial distribution.
Attention, if you reach outside the recommended range of p,
confidence interval would often reach outside the 0,1 interval.
Outside range of normal
approximation is proper to use
X
Lower limit =
X  ( n  X  1) F(1 / 2 ),1, 2
F(1 / 2 ), 1, 2 is (1- /2)*100% quantile, with relevant degrees of
freedom: 1=2(n-X+1) and 2=2X.
( X  1) F (1   / 2 ),  1,   2
Upper limit =
n  X  ( X  1) F (1   / 2 ),  1,   2
Even by 0 effectivity in experiment is upper limit non-zero,
even by hundred-percent effectivity in experiment is lower
limit <1.
Accuracy of p estimation grows with n
• How much observations do I need to have
standard error approximately w?
pq
n 2
w
If we presume, that there is approximately 20% individuals with
mutation of certain type and we want to determine their distribution
with standard error of 1% (95% confidence interval will be then
approximately 2%), we need to investigate
n = (0.2 x 0.8) / 0.012 = 1600 individuals.