Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Lecture #12 Lecture 12 Objectives: 1. Introduction to Statistics (a) Be able to compute expectation values from discrete and continuous distributions (b) Be able to define three properties of expectation values (c) Be able to compute moments of a distribution (d) Be able to compute permutations and combinations and to distinguish when each is applicable to a given problem (e) Be able to use Stirling’s approximation to compute factorials of large numbers 2. Ensembles: Vocabulary. Be able to define the following terms: (a) Ensemble (b) Time average (c) Ensemble average (d) Phase space (e) Ergodic Hypothesis (f) Equal A Priori Probabilities (g) Partition Function 1 Statistics We are dealing with astronomical numbers of variables and equations. If we consider a gas of atoms there are 3N positions and 3N momenta to integrate over, where N is the number of atoms. For a small quantity of gas this means that we need to integrate over about 1023 positions and momenta. Obviously hopeless. This is where statistics comes to the rescue. 1. Discrete Distributions. Averages are given by hxi = N 1 X xi N i=1 Let P(xi ) be the probability of observing xi . P(xj ) = X Number of ways xj can occur All possible events P(xi ) = 1 i Then you can find the expectation value of some variable that depends on x as follows: hF i = X i P(xi )F (xi ) 2 Lecture #12 Example: What is the average value you expect to see from rolling a die? Let xi be the value shown (number of dots) on side i of the die. hxi = 6 X xi P(xi ) = 1=1 6 1X i = 21/6 = 3.5 6 i=1 Example: What is the average value of the square of the number you get from rolling a die? hx2 i = X x2i P(xi ) = 1=1,6 1 X 2 i = 91/6 = 15.16667 6 i=1,6 Properties of expectation values: hcxi = chxi hx + yi = hxi + hyi hxyi = hxihyi ONLY FOR x, y INDEPENDENT Example: What is the average value you expect to see from rolling a pair of dice? 2 = 1 + 1, 3 = 2 + 1 = 1 + 2, 4 = 1 + 3 = 3 + 1 = 2 + 2, 5 = 1 + 4 = 4 + 1 = 2 + 3 = 3 + 2, etc. hx + yi = 1 (0 × 1 + 1 × 2 + 2 × 3 + 3 × 4 + 4 × 5 + 5 × 6 36 +6 × 7 + 5 × 8 + 4 × 9 + 3 × 10 + 2 × 11 + 1 × 12) = 7 = hxi + hxi = 3.5 + 3.5 Example: What is the expected value of the product of rolling a pair of dice? hx · yi = hxihyi = hxi2 = (3.5)2 = 12.25 You can also have joint probabilities, e.g., a distribution of weight and height. The expectation value of a multivariate distribution is given by hF i = X x1 ··· X F (x1 , . . . , xn )P(x1 , . . . , xn ) xn where P(x1 , . . . , xn ) is the joint probability of all xi occurring simultaneously. For example, suppose you want a bivariate distribution, P(x, y). This is probability of x and y both occurring, but that is the same as the product of the probability of x occurring given that y has already occurred, and the probability of y occurring, P(x, y) = Px|y Py where Px|y is the conditional probability of x given y. If x and y are independent, then Px|y = Px . Example: Consider the probability of observing a given number resulting from simultaneously rolling a fixed number of dice. The probability for rolling one die is uniform, but for two dice, shown as the top graph in Figure 1, is peaked at the average value of 7. Note that 3 Lecture #12 the distribution is not very sharply peaked. If you roll ten dice at a time, shown as the middle graph in Figure 1, then the distribution is more sharply peaked an clearly resembles a Gaussian distribution with a mean of 35. However, the distribution is obviously discrete, not continuous. As the number of dice is increased then the probability distribution becomes continuous and ever more sharp. The bottom graph in Figure 1 gives the distribution for rolling 1000 dice. To a very good approximation, the entire distribution may be replaced by the single most probable value of 3500. Imagine that if you were to roll Avogadro’s number of dice you could with complete certitude tell someone the number that would be observed to very many significant digits without ever having to roll the dice! That is the idea of statistical mechanics. Summarizing, (1) discrete distributions become continuous as the spacing between the states decreases and (2) probability distributions may be replaced by their single most probable value. This last point is a manifestation of the central limit theorem. 2. Continuous Distributions. Consider some function f (x). Averages are given by hJi = R J(x)f (x)dx R f (x)dx Define the probability density P(x) as f (x)dx P(x)dx = R f (x)dx Z P(x)dx = 1 Then the expectation value of some variable F is hF i = Z F (x)P(x)dx The rth moment about the mean is given by µr = h(x − hxi)r i The variance is the r = 2nd moment. This is an important quantity in statistics and statistical mechanics. The positive square root of the variance is the standard deviation. D V (x) = σ(x)2 = (x − hxi)2 E 3. Combinatorics (a) Permutations of N things taken X at a time N PX = N! (N − X)! Used when the order of the things is significant. You have 10 scrabble tiles, each with different letters. How many different 3 letter words can you make? 10 P3 = 10! = 10 × 9 × 8 = 720 7! 4 Lecture #12 Two Dice 0.18 0.16 0.14 Probability 0.12 0.10 0.08 0.06 0.04 0.02 0.00 2 3 4 5 6 7 8 9 10 11 12 Value Ten Dice 0.07 0.06 Probability 0.05 0.04 0.03 0.02 0.01 0.00 10 20 30 40 50 60 Value 1000 Dice 0.008 Probability 0.006 0.004 0.002 0.000 1000 2000 3000 4000 5000 6000 Value Figure 1: The probability of observing a given number from rolling two (top), ten (middle), and 1000 (bottom) dice. 5 Lecture #12 (b) Combinations, used when the ordering is not significant. Combinations of N things taken X at a time are N! N CX = X!(N − X)! Example: If you ignore the different ordering, how many different groups of letters can you get from your 10 scrabble tiles? 10! 10 × 9 × 8 = = 120 3!7! 3×2×1 The N CX s are also known as the binomial coefficients. Recall the binomial theorem: 10 C3 = n (a + b) = n X n Cr a r n−r b r=0 This related to the binomial distribution: P(x) = n Cx p x (1 − p)n−x The binomial distribution is for situations with two possible outcomes, e.g., success or failure, as in a Bernoulli trial. Each observation, or trial must be independent. Let the number of successes in n Bernoulli trials be x. Then P(x) is the probability of observing x successes in n trials. Example: Your odds of winning the jackpot on a slot machine is 1 in 106 or p = 10−6 . You play the slot 10 times, what is the probability that you hit the jackpot? P(1) = −6 1 10 C1 (10 ) (1 − 10−6 )9 = 10 × 10−6 (1 − 10−6 )9 = 9.99994 × 10−6 ≈ 10−5 Likewise there is a multinomial distribution N! f (N1 , N2 , . . .) = Q j Nj ! where N = P j Nj . (c) Stirling’s Approximation ln N ! ≈ N ln N − N Example: 5! = 120, exp (5 ln 5 − 5) = 21.0. 50! = 3.0414 × 1064 , exp (50 ln 50 − 50) = 1.71 × 1063 . 150! = 5.71 × 10262 , exp (150 ln 150 − 150) = 1.86 × 10261 . 2 Ensembles Introduction: The object of statistical mechanics is to calculate observable macroscopic thermodynamic properties, such as pressure, energy, etc. from the microscopic states and interactions among molecules. We formalize relationship between microscopic states and macroscopic thermodynamics in this lecture by writing down the equations for A, P, U, H, µ, etc. in terms of quantum mechanical energy levels and probability distribution functions. Definitions: 1. Ensemble: A virtual collection of a very large number of macroscopic systems, all related to each other by the ensemble constraints. 6 Lecture #12 2. Time average: The observed value of some property J of a given system as measured by macroscopic device. E.g., the pressure from a barometer, the temperature from a thermocouple, the entropy from an entropy meter. This is actually a time average of the form 1 t→∞ t hJi = lim Z t Jdt 0 3. Ensemble average: The average of the instantaneous values of J over each member of an ensemble. M M X 1 X Ji = hJi = Pi Ji M i=1 i=1 4. Phase space: The hyperspace that completely defines the microscopic state of a system. For N free particles there are 3N coordinates and 3N momenta required to specific the system. Hence one point in 6N dimensional hyperspace defines the state of the system at some instant. For quantum systems there are no points in phase space, only volume elements. 5. Ergodic Hypothesis. This is a postulate (i.e., not proved) that states that ensemble averages are equivalent to time averages, 1 t→∞ t hJi = lim Z t Jdt = 0 M X Pi Ji i=1 6. Equal a Priori Probabilities. a priori from the Latin, meaning “conceived beforehand”. Quantum states that have the same energy have the same probability. Therefore, the probability that a given quantum state is occupied is proportional to its energy and not anything else. As an analogy, consider a set of tuning forks, some tuned to a frequency ν1 , others tuned to frequency ν2 , and so forth. Now if you expose the tuning forks to sound vibrations, some of the tuning forks will start to vibrate. The ones that vibrate are those that have the same frequency as the sound to which they are exposed. The probability that a given tuning fork is vibrating depends only on the frequency to which it is tuned, not to other factors, such as the color of the tuning forks or the orientation in space (as long as they are not damped), etc. So it is with quantum states. The probability of a state being occupied depends only on its energy (frequency). 7. Partition Function: Normalizing factor for the probability of states. A partition function is a sum over all the quantum states partitioned by their energy level. Pi = e−Ei /kT /Q Q= X i e−Ei /kT