Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics
Large Systems
 Macroscopic systems involve
large numbers of particles.
 Microscopic determinism
 Macroscopic phenomena
 The basis is in mechanics from
individual molecules.
 Classical and quantum
 Consider 1 g of He as an ideal
gas.
 N = 1.5  1023 atoms
 Use only position and
momentum.
 3 + 3 = 6 coordinates / atom
 Total 9  1023 variables
 Requires about 4  109 PB
 Find the total kinetic energy.
 Statistical thermodynamics
provides the bridge between
levels.
K = (px2 + py2 + pz2)/2m
About 100 ops / collision
At 100 GFlops, 9  1014 s
1 set of collisions in 3  107 yr
Ensemble
 Computing time averages for
large systems is infeasible.
 Imagine a large number of
similar systems.
 Prepared identically
 Independent
 This ensemble of systems can
be used to derive theoretical
properties of a single system.
Probability
 Probability is often made as a statement before the fact.
 A priori assertion - theoretical
 50% probability for heads on a coin
 Probability can also reflect the statistics of many events.
 25% probability that 10 coins have 5 heads
 Fluctuations where 50% are not heads
 Probability can be used after the fact to describe a
measurement.
 A posteriori assertion - experimental
 Fraction of coins that were heads in a series of samples
Head Count
trial
#heads
trial
#heads
1
5
11
5
2
8
12
1
3
6
13
5
4
5
14
5
5
6
15
6
6
6
16
6
7
1
17
2
8
5
18
4
9
7
19
6
10
4
20
6
 Take a set of experimental
trials.
N number of trials
n number of values (bins)
i a specific trial (1 … N)
j a specific value (1 … n)
 Use 10 coins and 20 trials.
Distribution
f(x)
 Sorting trials by value forms a
distribution.
7
6
5
 Distribution function f counts
occurrences in a bin
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
x
 The mean is a measure of the
center of the distribution.
 Mathematical average
 Coin distribution <x> = 4.95
N
 Median - midway value
i 1
 Coin median = 5
f ( x)     xi  x 
1
x 
N
N
 xi
i 1
 Mode - most frequent value
 Coin mode = 6
Probability Distribution
P(x)
 The distribution function has a
sum equal to the number of
trials N.
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
x
1 N
1 N n
x   xi   x j xi  x j 
N i 1
N i 1 j 1
n
x   Pj x j
j 1
 A probability distribution p
normalizes the distribution
function by N.
 Sum is 1
 The mean can be expressed in
terms of the probability.
Subsample
trial
#heads
trial
#heads
1
5
11
5
2
8
12
1
3
6
13
5
4
5
14
5
5
6
15
6
6
6
16
6
7
1
17
2
8
5
18
4
9
7
19
6
10
4
20
6
 Subsamples of the data may
differ on their central value.
First five trials
Mean 6.0
Median 6
Mode 5 and 6, not unique
 Experimental probability
depends on the sample.
 Theoretical probability predicts
for an infinitely large sample.
Deviation
 Individual trials differ from the
mean.
xi  xi  x
 The deviation is the difference
of a trial from the mean.
1 N
x   xi  x
N i 1
N
 x 
x 0
N
 mean deviation is zero
 The fluctuation is the mean of
the squared deviations.
 Fluctuation is the variance
 Standard deviation squared
x 
2
1
N
 x2  x
 x  x 
N
i 1
2
2
i
Correlation
 Events may not be random,
but related to other events.
 Time measured by trial
 The correlation function
measures the mean of the
product of related deviations.
 Autocorrelation C0
 Different variables can be
correlated.
1
Ck 
N k
N k
 x  x x
i 1
ik
i
 x
Ck  xixi  k
1
Ck 
N k
1
C xy 
N
N k
 xi xi k  x
i 1
 x  x y
N
i 1
C xy  xy
2
i
i
 y
Independent Trials
trial
#heads
trial
#heads
1
5
11
5
2
8
12
1
3
6
13
5
4
5
14
5
5
6
15
6
6
6
16
6
7
1
17
2
8
5
18
4
9
7
19
6
10
4
20
6
 Autocorrelation within a
sample is the variance.
 Coin experiment C0 = 3.147
 Nearest neighbor correlation
tests for randomness.
 Coin experiment C1 = -0.345
 Much less than C0
 Ratio C1 / C0 = -0.11
 Periodic systems have Ct peak
for some period t.
Correlation Measure
 Independent trials should peak
strongly at 0.
 No connection to subsequent
events
 No periodic behavior
 “This sample autocorrelation
plot shows that the time series
is not random, but rather has a
high degree of autocorrelation
between adjacent and nearadjacent observations.”
nist.gov
Continuous Distribution
 Data that is continuously
distributed is treated with an
integral.
 Probability still normalized to 1
 The mean and variance are
given as the moments.
 First moment mean
 Second moment variance
 Correlation uses a time
integral.
N   dxf  x
f x 
P( x) 
N
x   dxP xx
C0   dxPxx2  x
C t    dtxt xt  t 
2
Joint Probability
 The probabilities of two
systems may be related.
A
 The intersection A  B
indicates that both conditions
are true.
C
B
C=AB
 Independent probability →
 P(A  B) = P(A)P(B)
 The union A  B indicates that
either condition is true.
 P(A  B) =P(A)+P(B)-P(A  B)
 P(A) + P(B), if exclusive
Joint Tosses
x
P(x)
0
0
1
0.10
2
0.05
3
0
4
0.10
5
0.30
6
0.35
7
0.05
8
0.05
9
0
10
0
 Define two classes from the
coin toss experiment.
 A={x<5}
 B={2<x<8}
 Individual probabilities are a
union of discrete bins.
 P(A) = 0.25, P(B) = 0.80
 P(A  B) = 0.95
 Dependent sets don’t follow
product rule.
 P(A  B) = 0.1  P(A)P(B)
Conditional Probability
 The probability of an
occurrence on a subset is a
conditional probability.
A
 Probability with respect to
subset.
 P(A | B) =P(A  B) / P(B)
C
B
 Use the same subsets for the
coin toss example.
 P(A | B) = 0.10 / 0.80 = 0.13
C=A|B
Combinatorics
 The probability that n specific
occurrences happen is the
product of the individual
occurrences.
 Other events don’t matter.
 Separate probability for
negative events
 Arbitrary choice of events
require permutations.
 Exactly n specific events
happen at p:
n
P p
 No events happen except the
specific events:
N n
Pq
 Select n arbitrary events from
a pool of N identical types.
N
N!
  
 n  n!( N  n)!
Binomial Distribution
 Treat events as a Bernoulli
process with discrete trials.
N separate trials
Trials independent
Binary outcome of trial
Probability same for all trials
mathworld.wolfram.com
 The general form is the
binomial distribution.
 Terms same as binomial
expansion
 Probabilities normalized
 N  n N n
Pn    p q
n
N
P
n 0
n
 ( p  q) N  1
Mean and Standard Deviation
 The mean m of the binomial
distribution:
N
N
 N  n N n
m   nPn   n  p q
n 0
n 0  n 
 Consider an arbitrary x, and
differentiate, and set x = 1.
 N  n n N n
N
( px  q)     p x q
n 0  n 
N
Np( px  q )
N 1
N
  nx n 1 Pn
n 0
 The standard deviation s of
the binomial distribution:
N
s   n  m 2 Pn
2
n 0
s 2   (n2 Pn  2mnPn  m 2 Pn )
s 2   n2 Pn  2m  nPn  m 2  Pn
s 2  [ N ( N  1) p 2  m ]  2m 2  m 2
s 2  N 2 p 2  Np 2  Np  N 2 p 2
N
Np   nPn  m
n 0
s  Np(1  p)  Npq
Poisson Distribution
 Many processes are marked
by rare occurrences.
 Large N, small n, small p
N
N!
Nn
  
 n  n!( N  n)! n!
q N n  q N  (1  p) N
N ( N  1) 2
p 
2!
( Np) 2
 1  Np 
   e  Np
2!
q N n  1  Np 
q N n
 This is the Poisson
distribution.
 Probability depends on only
one parameter Np
 Normalized when summed
from n =0 to .
 N  n N n ( Np) n  Np
Pn    p q
e
n!
n
Poisson Properties
 The mean and standard deviation are simply related.
 Mean m = Np, standard deviation s2 = m, s  m
 Unlike the binomial distribution the Poisson function has
values for n > N.
Poisson Away From Zero
 The Poisson distribution is
based on the mean m = Np.
 Assumed N >> 1, N >> n.
 Now assume that n >> 1, m
large and Pn >> 0 only over a
narrow range.
 This generates a normal or
Gaussian distribution.
Let x = n – m.
m m  xe m
m m m xe m
Px 
( m  x)! m![( m  x)! / m!]
Use Stirling’s formula.
m! 2mm m e m
Px 
Px 
mx
2m[( m  1)...( m  x)]
1
2m[(1  1 / m )...(1  x / m )]
e x / 2m
Px 
1/ m
x/m
2m[(e )...(e )]
2m
1
2
Normal Distribution
 The full normal distribution
separates mean m and
standard deviation s
parameters.
P(x)
1
 x  m 2 / 2s 2
f ( x) 
e
2 s
 Tables provide the integral of
the distribution function.
 Useful benchmarks:
 P(|x - m| < 1 s = 0.683
 P(|x - m| < 2 s = 0.954
 P(|x - m| < 3 s = 0.997
m
0
x