Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discrete distributions and their use Poisson and binomial distributions Discrete distribution • They can attain only selected discrete values • The most common are those, that can gain just integer values. (usually 0, 1, 2, etc • The most common are Poisson and Binomial distribution. • Others are negative binomial and Neymann (A and B) one (called “contagious” d.) Poisson distribution Poisson distribution Model: I have a many cups and I throw a taw many times. In every throw I hit inside some cup. Every throw is independent of the previous ones (that means, that presence of taw in a cup does not influence if I hit inside it again). Number of taws in cup will then have Poisson distribution. Its only parameter, λ, is ratio of all taws to number of cups (and so, mean number of taws in cup). Poisson distribution Poisson distribution e P( x X ) X! X Mean value and variance of distribution = λ Distribution is always positively skew, nevertheless the higher λ, the smaller skew. If λ is large (say>12), the distribution is close to normal. Where to use it • Poisson variable as dependent (in ANOVA, regression): • I either use Generalize Linear Models (one of their possibilities is Poisson distribution) • Or I use the square root transformation prior to ANOVA 3 X X _ or _ X X 8 • It should make the distribution more symmetrical and stabilize the variance at the same time (i.e. improving normality and homoscedasticity) Evaluations of randomness in continuous space (e.g. individuals of a plant species in meadow) Fig. Types of distribution in continuous space: cluster (A), random (B), regular - uniform (C), totally regular in square net (D). or in discrete units (for parasites on a carp or recombinant nodi on a chromosome) If individuals are distributed randomly Numbers of individuals in randomly placed squares have Poisson distribution - so X s 2 Occurrence of an individual in a unit, does not change the probability of occurrence of another individual in this unit If individuals are distributed in clusters Numbers of individuals in randomly placed squares have X s 2 Occurrence of an individual increases probability of finding another individual in the same unit Boletuses grow in clusters and so, if I find one, I start looking for its neighbours Obr. Typy rozmístění individuí ve spojitém prostoru: shlukovité (A If there is tendency to regularity or total regularity X s 2 Occurrence of an individual decreases probability of finding another individual in the same unit hlukovité (A), náhodné (B), pravidelné - rovnoměrné (C), 2 Ratio s X (for number of individuals) is then considered to be a characteristic of distribution of individuals Popular is also Lloyd index – it doesn’t change if individuals randomly die out 2 s 1 L X X Deviation from randomness can be tested: 1. Classic test goodness of fit for Poisson distribution 2 2. s ( n 1) X has approximately 2 distribution with n-1 degrees of freedom for Poisson distribution. Many methods describing distribution of objects in continuous space • So called Spatial pattern analysis; popular is K-function • Size of clusters • Intensity of clustering • Change of spatial pattern in time • Spatial relation of objects of different categories (e.g. alive and death trees, two species of trees etc.) Distribution of population Corresponding distribution of number of individuals in experimental unit Ratio of variance and mean (2/) of distribution Relationship of presence of individuals The commonest ecological reasons for differences from randomness regular (uniform) e.g. binomial <1 Occurence of one individual in the unit decreases probability of occurance of another individual Intraspecific competition, teritorial behaviour random Poisson 1 Occurence of individuals is independent on each other clustered contagious (e.g. negative binomial, Neyman) >1 Occurence of one individual in the unit increases probability of occurance of another individual Type of reproduction, heterogenity of environment Binomial distribution Number of successes from number of independent attempts, where every attempt has the same probability of success Model: I throw to every cup n-times (e.g. 5-times), in every attempt I have probability p, to hit inside. Number of hits per cup has binomial distribution then. There are two parameters - n – number of attempts, and p – probability in every attempt. q – probability of failure isn’t the other parameter, as q =1-p Binomial distribution 3 2 Holds n! X n X P( x X ) p q X !(n X )! x np npq 2 x Distribution is symmetric (and by given n nearest to normal one for p=q=1 distribution is mounting to normal one with growing n Most common use – estimation of parameter p • Estimate of percentage of worm-eaten apples (in whole orchard) • Estimates of cover of a species (point quadrate) • Estimate of propotion (percentage) of people having antibodies against borreliosis • How much (percents) people will vote for Political party of slight progress in bounds of law We have n experiments • 100 (randomly chosen) investigated apples • 50 (randomly chosen) investigated people for borreliosis • 800 (randomly chosen) respondents in survey X pˆ n pq n 2 pˆ pˆ qˆ s n 1 2 pˆ from 100 apples were 15 worm-eaten, we estimate p 0.15 (15%) But we don’t know p , we know just estimation. By normal approximation is (1-) confidence interval 1 pˆ Z (1 / 2 ) spˆ 2n Z(1 - /2) is (1-/2)*100percentage qantile of normalized normal distribution p̂ n 0.5 30 0.4 nebo 0.6 50 0.3 nebo 0.7 80 0.2 nebo 0.8 200 0.1 nebo 0.9 600 Table A Pertinence of normal approximation for binomial distribution. Attention, if you reach outside the recommended range of p, confidence interval would often reach outside the 0,1 interval. Outside range of normal approximation is proper to use X Lower limit = X ( n X 1) F(1 / 2 ),1, 2 F(1 / 2 ), 1, 2 is (1- /2)*100% quantile, with relevant degrees of freedom: 1=2(n-X+1) and 2=2X. ( X 1) F (1 / 2 ), 1, 2 Upper limit = n X ( X 1) F (1 / 2 ), 1, 2 Even by 0 effectivity in experiment is upper limit non-zero, even by hundred-percent effectivity in experiment is lower limit <1. Accuracy of p estimation grows with n • How much observations do I need to have standard error approximately w? pq n 2 w If we presume, that there is approximately 20% individuals with mutation of certain type and we want to determine their distribution with standard error of 1% (95% confidence interval will be then approximately 2%), we need to investigate n = (0.2 x 0.8) / 0.012 = 1600 individuals.