Download Statistics

Statistics Large Systems  Macroscopic systems involve large numbers of particles.  Microscopic determinism  Macroscopic phenomena  The basis is in mechanics from individual molecules.  Classical and quantum  Consider 1 g of He as an ideal gas.  N = 1.5  1023 atoms  Use only position and momentum.  3 + 3 = 6 coordinates / atom  Total 9  1023 variables  Requires about 4  109 PB  Find the total kinetic energy.  Statistical thermodynamics provides the bridge between levels.     K = (px2 + py2 + pz2)/2m About 100 ops / collision At 100 GFlops, 9  1014 s 1 set of collisions in 3  107 yr Ensemble  Computing time averages for large systems is infeasible.  Imagine a large number of similar systems.  Prepared identically  Independent  This ensemble of systems can be used to derive theoretical properties of a single system. Probability  Probability is often made as a statement before the fact.  A priori assertion - theoretical  50% probability for heads on a coin  Probability can also reflect the statistics of many events.  25% probability that 10 coins have 5 heads  Fluctuations where 50% are not heads  Probability can be used after the fact to describe a measurement.  A posteriori assertion - experimental  Fraction of coins that were heads in a series of samples Head Count trial #heads trial #heads 1 5 11 5 2 8 12 1 3 6 13 5 4 5 14 5 5 6 15 6 6 6 16 6 7 1 17 2 8 5 18 4 9 7 19 6 10 4 20 6  Take a set of experimental trials.     N number of trials n number of values (bins) i a specific trial (1 … N) j a specific value (1 … n)  Use 10 coins and 20 trials. Distribution f(x)  Sorting trials by value forms a distribution. 7 6 5  Distribution function f counts occurrences in a bin 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 x  The mean is a measure of the center of the distribution.  Mathematical average  Coin distribution <x> = 4.95 N  Median - midway value i 1  Coin median = 5 f ( x)     xi  x  1 x  N N  xi i 1  Mode - most frequent value  Coin mode = 6 Probability Distribution P(x)  The distribution function has a sum equal to the number of trials N. 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 x 1 N 1 N n x   xi   x j xi  x j  N i 1 N i 1 j 1 n x   Pj x j j 1  A probability distribution p normalizes the distribution function by N.  Sum is 1  The mean can be expressed in terms of the probability. Subsample trial #heads trial #heads 1 5 11 5 2 8 12 1 3 6 13 5 4 5 14 5 5 6 15 6 6 6 16 6 7 1 17 2 8 5 18 4 9 7 19 6 10 4 20 6  Subsamples of the data may differ on their central value.     First five trials Mean 6.0 Median 6 Mode 5 and 6, not unique  Experimental probability depends on the sample.  Theoretical probability predicts for an infinitely large sample. Deviation  Individual trials differ from the mean. xi  xi  x  The deviation is the difference of a trial from the mean. 1 N x   xi  x N i 1 N  x  x 0 N  mean deviation is zero  The fluctuation is the mean of the squared deviations.  Fluctuation is the variance  Standard deviation squared x  2 1  N  x2  x   x  x  N i 1 2 2 i Correlation  Events may not be random, but related to other events.  Time measured by trial  The correlation function measures the mean of the product of related deviations.  Autocorrelation C0  Different variables can be correlated. 1 Ck  N k N k  x  x x i 1 ik i  x Ck  xixi  k 1 Ck  N k 1 C xy  N N k  xi xi k  x i 1  x  x y N i 1 C xy  xy 2 i i  y   Independent Trials trial #heads trial #heads 1 5 11 5 2 8 12 1 3 6 13 5 4 5 14 5 5 6 15 6 6 6 16 6 7 1 17 2 8 5 18 4 9 7 19 6 10 4 20 6  Autocorrelation within a sample is the variance.  Coin experiment C0 = 3.147  Nearest neighbor correlation tests for randomness.  Coin experiment C1 = -0.345  Much less than C0  Ratio C1 / C0 = -0.11  Periodic systems have Ct peak for some period t. Correlation Measure  Independent trials should peak strongly at 0.  No connection to subsequent events  No periodic behavior  “This sample autocorrelation plot shows that the time series is not random, but rather has a high degree of autocorrelation between adjacent and nearadjacent observations.” nist.gov Continuous Distribution  Data that is continuously distributed is treated with an integral.  Probability still normalized to 1  The mean and variance are given as the moments.  First moment mean  Second moment variance  Correlation uses a time integral. N   dxf  x f x  P( x)  N x   dxP xx C0   dxPxx2  x C t    dtxt xt  t  2 Joint Probability  The probabilities of two systems may be related. A  The intersection A  B indicates that both conditions are true. C B C=AB  Independent probability →  P(A  B) = P(A)P(B)  The union A  B indicates that either condition is true.  P(A  B) =P(A)+P(B)-P(A  B)  P(A) + P(B), if exclusive Joint Tosses x P(x) 0 0 1 0.10 2 0.05 3 0 4 0.10 5 0.30 6 0.35 7 0.05 8 0.05 9 0 10 0  Define two classes from the coin toss experiment.  A={x<5}  B={2<x<8}  Individual probabilities are a union of discrete bins.  P(A) = 0.25, P(B) = 0.80  P(A  B) = 0.95  Dependent sets don’t follow product rule.  P(A  B) = 0.1  P(A)P(B) Conditional Probability  The probability of an occurrence on a subset is a conditional probability. A  Probability with respect to subset.  P(A | B) =P(A  B) / P(B) C B  Use the same subsets for the coin toss example.  P(A | B) = 0.10 / 0.80 = 0.13 C=A|B Combinatorics  The probability that n specific occurrences happen is the product of the individual occurrences.  Other events don’t matter.  Separate probability for negative events  Arbitrary choice of events require permutations.  Exactly n specific events happen at p: n P p  No events happen except the specific events: N n Pq  Select n arbitrary events from a pool of N identical types. N N!     n  n!( N  n)! Binomial Distribution  Treat events as a Bernoulli process with discrete trials.     N separate trials Trials independent Binary outcome of trial Probability same for all trials mathworld.wolfram.com  The general form is the binomial distribution.  Terms same as binomial expansion  Probabilities normalized  N  n N n Pn    p q n N P n 0 n  ( p  q) N  1 Mean and Standard Deviation  The mean m of the binomial distribution: N N  N  n N n m   nPn   n  p q n 0 n 0  n   Consider an arbitrary x, and differentiate, and set x = 1.  N  n n N n N ( px  q)     p x q n 0  n  N Np( px  q ) N 1 N   nx n 1 Pn n 0  The standard deviation s of the binomial distribution: N s   n  m 2 Pn 2 n 0 s 2   (n2 Pn  2mnPn  m 2 Pn ) s 2   n2 Pn  2m  nPn  m 2  Pn s 2  [ N ( N  1) p 2  m ]  2m 2  m 2 s 2  N 2 p 2  Np 2  Np  N 2 p 2 N Np   nPn  m n 0 s  Np(1  p)  Npq Poisson Distribution  Many processes are marked by rare occurrences.  Large N, small n, small p N N! Nn      n  n!( N  n)! n! q N n  q N  (1  p) N N ( N  1) 2 p  2! ( Np) 2  1  Np     e  Np 2! q N n  1  Np  q N n  This is the Poisson distribution.  Probability depends on only one parameter Np  Normalized when summed from n =0 to .  N  n N n ( Np) n  Np Pn    p q  e n! n Poisson Properties  The mean and standard deviation are simply related.  Mean m = Np, standard deviation s2 = m, s  m  Unlike the binomial distribution the Poisson function has values for n > N. Poisson Away From Zero  The Poisson distribution is based on the mean m = Np.  Assumed N >> 1, N >> n.  Now assume that n >> 1, m large and Pn >> 0 only over a narrow range.  This generates a normal or Gaussian distribution. Let x = n – m. m m  xe m m m m xe m Px   ( m  x)! m![( m  x)! / m!] Use Stirling’s formula. m! 2mm m e m Px  Px  mx 2m[( m  1)...( m  x)] 1 2m[(1  1 / m )...(1  x / m )] e x / 2m Px   1/ m x/m 2m[(e )...(e )] 2m 1 2 Normal Distribution  The full normal distribution separates mean m and standard deviation s parameters. P(x) 1  x  m 2 / 2s 2 f ( x)  e 2 s  Tables provide the integral of the distribution function.  Useful benchmarks:  P(|x - m| < 1 s = 0.683  P(|x - m| < 2 s = 0.954  P(|x - m| < 3 s = 0.997 m 0 x

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics