Download Lecture 12 1 Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Expected value wikipedia , lookup

Negative binomial distribution wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
1
Lecture #12
Lecture 12
Objectives:
1. Introduction to Statistics
(a) Be able to compute expectation values from discrete and continuous distributions
(b) Be able to define three properties of expectation values
(c) Be able to compute moments of a distribution
(d) Be able to compute permutations and combinations and to distinguish when each is
applicable to a given problem
(e) Be able to use Stirling’s approximation to compute factorials of large numbers
2. Ensembles: Vocabulary. Be able to define the following terms:
(a) Ensemble
(b) Time average
(c) Ensemble average
(d) Phase space
(e) Ergodic Hypothesis
(f) Equal A Priori Probabilities
(g) Partition Function
1
Statistics
We are dealing with astronomical numbers of variables and equations. If we consider a gas of atoms
there are 3N positions and 3N momenta to integrate over, where N is the number of atoms. For a
small quantity of gas this means that we need to integrate over about 1023 positions and momenta.
Obviously hopeless. This is where statistics comes to the rescue.
1. Discrete Distributions. Averages are given by
hxi =
N
1 X
xi
N i=1
Let P(xi ) be the probability of observing xi .
P(xj ) =
X
Number of ways xj can occur
All possible events
P(xi ) = 1
i
Then you can find the expectation value of some variable that depends on x as follows:
hF i =
X
i
P(xi )F (xi )
2
Lecture #12
Example: What is the average value you expect to see from rolling a die? Let xi be the value
shown (number of dots) on side i of the die.
hxi =
6
X
xi P(xi ) =
1=1
6
1X
i = 21/6 = 3.5
6 i=1
Example: What is the average value of the square of the number you get from rolling a die?
hx2 i =
X
x2i P(xi ) =
1=1,6
1 X 2
i = 91/6 = 15.16667
6 i=1,6
Properties of expectation values:
hcxi = chxi
hx + yi = hxi + hyi
hxyi = hxihyi ONLY FOR x, y INDEPENDENT
Example: What is the average value you expect to see from rolling a pair of dice? 2 = 1 + 1,
3 = 2 + 1 = 1 + 2, 4 = 1 + 3 = 3 + 1 = 2 + 2, 5 = 1 + 4 = 4 + 1 = 2 + 3 = 3 + 2, etc.
hx + yi =
1
(0 × 1 + 1 × 2 + 2 × 3 + 3 × 4 + 4 × 5 + 5 × 6
36
+6 × 7 + 5 × 8 + 4 × 9 + 3 × 10 + 2 × 11 + 1 × 12) = 7
= hxi + hxi = 3.5 + 3.5
Example: What is the expected value of the product of rolling a pair of dice?
hx · yi = hxihyi = hxi2 = (3.5)2 = 12.25
You can also have joint probabilities, e.g., a distribution of weight and height. The expectation
value of a multivariate distribution is given by
hF i =
X
x1
···
X
F (x1 , . . . , xn )P(x1 , . . . , xn )
xn
where P(x1 , . . . , xn ) is the joint probability of all xi occurring simultaneously. For example,
suppose you want a bivariate distribution, P(x, y). This is probability of x and y both
occurring, but that is the same as the product of the probability of x occurring given that y
has already occurred, and the probability of y occurring,
P(x, y) = Px|y Py
where Px|y is the conditional probability of x given y. If x and y are independent, then
Px|y = Px .
Example: Consider the probability of observing a given number resulting from simultaneously
rolling a fixed number of dice. The probability for rolling one die is uniform, but for two
dice, shown as the top graph in Figure 1, is peaked at the average value of 7. Note that
3
Lecture #12
the distribution is not very sharply peaked. If you roll ten dice at a time, shown as the
middle graph in Figure 1, then the distribution is more sharply peaked an clearly resembles
a Gaussian distribution with a mean of 35. However, the distribution is obviously discrete,
not continuous. As the number of dice is increased then the probability distribution becomes
continuous and ever more sharp. The bottom graph in Figure 1 gives the distribution for
rolling 1000 dice. To a very good approximation, the entire distribution may be replaced by
the single most probable value of 3500. Imagine that if you were to roll Avogadro’s number
of dice you could with complete certitude tell someone the number that would be observed to
very many significant digits without ever having to roll the dice! That is the idea of statistical
mechanics. Summarizing, (1) discrete distributions become continuous as the spacing between
the states decreases and (2) probability distributions may be replaced by their single most
probable value. This last point is a manifestation of the central limit theorem.
2. Continuous Distributions. Consider some function f (x). Averages are given by
hJi =
R
J(x)f (x)dx
R
f (x)dx
Define the probability density P(x) as
f (x)dx
P(x)dx = R
f (x)dx
Z
P(x)dx = 1
Then the expectation value of some variable F is
hF i =
Z
F (x)P(x)dx
The rth moment about the mean is given by
µr = h(x − hxi)r i
The variance is the r = 2nd moment. This is an important quantity in statistics and statistical
mechanics. The positive square root of the variance is the standard deviation.
D
V (x) = σ(x)2 = (x − hxi)2
E
3. Combinatorics
(a) Permutations of N things taken X at a time
N PX
=
N!
(N − X)!
Used when the order of the things is significant. You have 10 scrabble tiles, each with
different letters. How many different 3 letter words can you make?
10 P3
=
10!
= 10 × 9 × 8 = 720
7!
4
Lecture #12
Two Dice
0.18
0.16
0.14
Probability
0.12
0.10
0.08
0.06
0.04
0.02
0.00
2
3
4
5
6
7
8
9
10
11
12
Value
Ten Dice
0.07
0.06
Probability
0.05
0.04
0.03
0.02
0.01
0.00
10
20
30
40
50
60
Value
1000 Dice
0.008
Probability
0.006
0.004
0.002
0.000
1000
2000
3000
4000
5000
6000
Value
Figure 1: The probability of observing a given number from rolling two (top), ten (middle), and
1000 (bottom) dice.
5
Lecture #12
(b) Combinations, used when the ordering is not significant. Combinations of N things
taken X at a time are
N!
N CX =
X!(N − X)!
Example: If you ignore the different ordering, how many different groups of letters can
you get from your 10 scrabble tiles?
10!
10 × 9 × 8
=
= 120
3!7!
3×2×1
The N CX s are also known as the binomial coefficients. Recall the binomial theorem:
10 C3
=
n
(a + b) =
n
X
n Cr a
r n−r
b
r=0
This related to the binomial distribution:
P(x) =
n Cx p
x
(1 − p)n−x
The binomial distribution is for situations with two possible outcomes, e.g., success or
failure, as in a Bernoulli trial. Each observation, or trial must be independent. Let the
number of successes in n Bernoulli trials be x. Then P(x) is the probability of observing
x successes in n trials. Example: Your odds of winning the jackpot on a slot machine is
1 in 106 or p = 10−6 . You play the slot 10 times, what is the probability that you hit
the jackpot?
P(1) =
−6 1
10 C1 (10 ) (1
− 10−6 )9 = 10 × 10−6 (1 − 10−6 )9 = 9.99994 × 10−6 ≈ 10−5
Likewise there is a multinomial distribution
N!
f (N1 , N2 , . . .) = Q
j Nj !
where N =
P
j
Nj .
(c) Stirling’s Approximation
ln N ! ≈ N ln N − N
Example: 5! = 120, exp (5 ln 5 − 5) = 21.0. 50! = 3.0414 × 1064 , exp (50 ln 50 − 50) =
1.71 × 1063 . 150! = 5.71 × 10262 , exp (150 ln 150 − 150) = 1.86 × 10261 .
2
Ensembles
Introduction: The object of statistical mechanics is to calculate observable macroscopic thermodynamic properties, such as pressure, energy, etc. from the microscopic states and interactions
among molecules. We formalize relationship between microscopic states and macroscopic thermodynamics in this lecture by writing down the equations for A, P, U, H, µ, etc. in terms of quantum
mechanical energy levels and probability distribution functions.
Definitions:
1. Ensemble: A virtual collection of a very large number of macroscopic systems, all related to
each other by the ensemble constraints.
6
Lecture #12
2. Time average: The observed value of some property J of a given system as measured by
macroscopic device. E.g., the pressure from a barometer, the temperature from a thermocouple, the entropy from an entropy meter. This is actually a time average of the form
1
t→∞ t
hJi = lim
Z
t
Jdt
0
3. Ensemble average: The average of the instantaneous values of J over each member of an
ensemble.
M
M
X
1 X
Ji =
hJi =
Pi Ji
M i=1
i=1
4. Phase space: The hyperspace that completely defines the microscopic state of a system. For
N free particles there are 3N coordinates and 3N momenta required to specific the system.
Hence one point in 6N dimensional hyperspace defines the state of the system at some instant.
For quantum systems there are no points in phase space, only volume elements.
5. Ergodic Hypothesis. This is a postulate (i.e., not proved) that states that ensemble averages
are equivalent to time averages,
1
t→∞ t
hJi = lim
Z
t
Jdt =
0
M
X
Pi Ji
i=1
6. Equal a Priori Probabilities. a priori from the Latin, meaning “conceived beforehand”. Quantum states that have the same energy have the same probability. Therefore, the probability
that a given quantum state is occupied is proportional to its energy and not anything else.
As an analogy, consider a set of tuning forks, some tuned to a frequency ν1 , others tuned to
frequency ν2 , and so forth. Now if you expose the tuning forks to sound vibrations, some of
the tuning forks will start to vibrate. The ones that vibrate are those that have the same
frequency as the sound to which they are exposed. The probability that a given tuning fork
is vibrating depends only on the frequency to which it is tuned, not to other factors, such as
the color of the tuning forks or the orientation in space (as long as they are not damped),
etc. So it is with quantum states. The probability of a state being occupied depends only on
its energy (frequency).
7. Partition Function: Normalizing factor for the probability of states. A partition function is
a sum over all the quantum states partitioned by their energy level.
Pi = e−Ei /kT /Q
Q=
X
i
e−Ei /kT