Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Large Systems Macroscopic systems involve large numbers of particles. Microscopic determinism Macroscopic phenomena The basis is in mechanics from individual molecules. Classical and quantum Consider 1 g of He as an ideal gas. N = 1.5 1023 atoms Use only position and momentum. 3 + 3 = 6 coordinates / atom Total 9 1023 variables Requires about 4 109 PB Find the total kinetic energy. Statistical thermodynamics provides the bridge between levels. K = (px2 + py2 + pz2)/2m About 100 ops / collision At 100 GFlops, 9 1014 s 1 set of collisions in 3 107 yr Ensemble Computing time averages for large systems is infeasible. Imagine a large number of similar systems. Prepared identically Independent This ensemble of systems can be used to derive theoretical properties of a single system. Probability Probability is often made as a statement before the fact. A priori assertion - theoretical 50% probability for heads on a coin Probability can also reflect the statistics of many events. 25% probability that 10 coins have 5 heads Fluctuations where 50% are not heads Probability can be used after the fact to describe a measurement. A posteriori assertion - experimental Fraction of coins that were heads in a series of samples Head Count trial #heads trial #heads 1 5 11 5 2 8 12 1 3 6 13 5 4 5 14 5 5 6 15 6 6 6 16 6 7 1 17 2 8 5 18 4 9 7 19 6 10 4 20 6 Take a set of experimental trials. N number of trials n number of values (bins) i a specific trial (1 … N) j a specific value (1 … n) Use 10 coins and 20 trials. Distribution f(x) Sorting trials by value forms a distribution. 7 6 5 Distribution function f counts occurrences in a bin 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 x The mean is a measure of the center of the distribution. Mathematical average Coin distribution <x> = 4.95 N Median - midway value i 1 Coin median = 5 f ( x) xi x 1 x N N xi i 1 Mode - most frequent value Coin mode = 6 Probability Distribution P(x) The distribution function has a sum equal to the number of trials N. 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 x 1 N 1 N n x xi x j xi x j N i 1 N i 1 j 1 n x Pj x j j 1 A probability distribution p normalizes the distribution function by N. Sum is 1 The mean can be expressed in terms of the probability. Subsample trial #heads trial #heads 1 5 11 5 2 8 12 1 3 6 13 5 4 5 14 5 5 6 15 6 6 6 16 6 7 1 17 2 8 5 18 4 9 7 19 6 10 4 20 6 Subsamples of the data may differ on their central value. First five trials Mean 6.0 Median 6 Mode 5 and 6, not unique Experimental probability depends on the sample. Theoretical probability predicts for an infinitely large sample. Deviation Individual trials differ from the mean. xi xi x The deviation is the difference of a trial from the mean. 1 N x xi x N i 1 N x x 0 N mean deviation is zero The fluctuation is the mean of the squared deviations. Fluctuation is the variance Standard deviation squared x 2 1 N x2 x x x N i 1 2 2 i Correlation Events may not be random, but related to other events. Time measured by trial The correlation function measures the mean of the product of related deviations. Autocorrelation C0 Different variables can be correlated. 1 Ck N k N k x x x i 1 ik i x Ck xixi k 1 Ck N k 1 C xy N N k xi xi k x i 1 x x y N i 1 C xy xy 2 i i y Independent Trials trial #heads trial #heads 1 5 11 5 2 8 12 1 3 6 13 5 4 5 14 5 5 6 15 6 6 6 16 6 7 1 17 2 8 5 18 4 9 7 19 6 10 4 20 6 Autocorrelation within a sample is the variance. Coin experiment C0 = 3.147 Nearest neighbor correlation tests for randomness. Coin experiment C1 = -0.345 Much less than C0 Ratio C1 / C0 = -0.11 Periodic systems have Ct peak for some period t. Correlation Measure Independent trials should peak strongly at 0. No connection to subsequent events No periodic behavior “This sample autocorrelation plot shows that the time series is not random, but rather has a high degree of autocorrelation between adjacent and nearadjacent observations.” nist.gov Continuous Distribution Data that is continuously distributed is treated with an integral. Probability still normalized to 1 The mean and variance are given as the moments. First moment mean Second moment variance Correlation uses a time integral. N dxf x f x P( x) N x dxP xx C0 dxPxx2 x C t dtxt xt t 2 Joint Probability The probabilities of two systems may be related. A The intersection A B indicates that both conditions are true. C B C=AB Independent probability → P(A B) = P(A)P(B) The union A B indicates that either condition is true. P(A B) =P(A)+P(B)-P(A B) P(A) + P(B), if exclusive Joint Tosses x P(x) 0 0 1 0.10 2 0.05 3 0 4 0.10 5 0.30 6 0.35 7 0.05 8 0.05 9 0 10 0 Define two classes from the coin toss experiment. A={x<5} B={2<x<8} Individual probabilities are a union of discrete bins. P(A) = 0.25, P(B) = 0.80 P(A B) = 0.95 Dependent sets don’t follow product rule. P(A B) = 0.1 P(A)P(B) Conditional Probability The probability of an occurrence on a subset is a conditional probability. A Probability with respect to subset. P(A | B) =P(A B) / P(B) C B Use the same subsets for the coin toss example. P(A | B) = 0.10 / 0.80 = 0.13 C=A|B Combinatorics The probability that n specific occurrences happen is the product of the individual occurrences. Other events don’t matter. Separate probability for negative events Arbitrary choice of events require permutations. Exactly n specific events happen at p: n P p No events happen except the specific events: N n Pq Select n arbitrary events from a pool of N identical types. N N! n n!( N n)! Binomial Distribution Treat events as a Bernoulli process with discrete trials. N separate trials Trials independent Binary outcome of trial Probability same for all trials mathworld.wolfram.com The general form is the binomial distribution. Terms same as binomial expansion Probabilities normalized N n N n Pn p q n N P n 0 n ( p q) N 1 Mean and Standard Deviation The mean m of the binomial distribution: N N N n N n m nPn n p q n 0 n 0 n Consider an arbitrary x, and differentiate, and set x = 1. N n n N n N ( px q) p x q n 0 n N Np( px q ) N 1 N nx n 1 Pn n 0 The standard deviation s of the binomial distribution: N s n m 2 Pn 2 n 0 s 2 (n2 Pn 2mnPn m 2 Pn ) s 2 n2 Pn 2m nPn m 2 Pn s 2 [ N ( N 1) p 2 m ] 2m 2 m 2 s 2 N 2 p 2 Np 2 Np N 2 p 2 N Np nPn m n 0 s Np(1 p) Npq Poisson Distribution Many processes are marked by rare occurrences. Large N, small n, small p N N! Nn n n!( N n)! n! q N n q N (1 p) N N ( N 1) 2 p 2! ( Np) 2 1 Np e Np 2! q N n 1 Np q N n This is the Poisson distribution. Probability depends on only one parameter Np Normalized when summed from n =0 to . N n N n ( Np) n Np Pn p q e n! n Poisson Properties The mean and standard deviation are simply related. Mean m = Np, standard deviation s2 = m, s m Unlike the binomial distribution the Poisson function has values for n > N. Poisson Away From Zero The Poisson distribution is based on the mean m = Np. Assumed N >> 1, N >> n. Now assume that n >> 1, m large and Pn >> 0 only over a narrow range. This generates a normal or Gaussian distribution. Let x = n – m. m m xe m m m m xe m Px ( m x)! m![( m x)! / m!] Use Stirling’s formula. m! 2mm m e m Px Px mx 2m[( m 1)...( m x)] 1 2m[(1 1 / m )...(1 x / m )] e x / 2m Px 1/ m x/m 2m[(e )...(e )] 2m 1 2 Normal Distribution The full normal distribution separates mean m and standard deviation s parameters. P(x) 1 x m 2 / 2s 2 f ( x) e 2 s Tables provide the integral of the distribution function. Useful benchmarks: P(|x - m| < 1 s = 0.683 P(|x - m| < 2 s = 0.954 P(|x - m| < 3 s = 0.997 m 0 x