* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability
Survey
Document related concepts
Transcript
Concept of Probability AS3105 Astrophysical Processes 1 Dhani Herdiwijaya Probability in Everyday Life • • • • • Rain fall Traffic jam Across the street Catastrophic meteoroid airplane travel. Is it safe to fly? Laplace (1819) Probability theory is nothing but common sense reduced to calculation Maxwell (1850) The true logic of this world is the calculus of probabilities . . . That is, probability is a natural language for describing real world phenomena A mathematical formulation of games of chance began in the middle of the 17th century. Some of the important contributors over the following 150 years include Pascal, Fermat, Descartes, Leibnitz, Newton, Bernoulli, and Laplace Development it is remarkable that the theory of probability took so long to develop. An understanding of probability is elusive due in part to the fact that the probably depends on the status of the information that we have (a fact well known to poker players). Although the rules of probability are defined by simple mathematical rules, an understanding of probability is greatly aided by experience with real data and concrete problems. Probability To calculate the probability of a particular outcome, count the number of all possible results. Then count the number that give the desired outcome. The probability of the desired outcome is equal to the number that gives the desired outcome divided by the total number of outcomes. Hence, 1/6 for one die. Rules of Probability In 1933 the Russian mathematician A. N. Kolmogorov formulated a complete set of axioms for the mathematical definition of probability. For each event i, we assign a probability P(i) that satisfies the conditions P (i) ≥ 0 P (i) = 0 means that the event cannot occur P (i) = 1 means that the event must occur The normalization condition says that the sum of the probabilities of all possible mutually exclusive outcomes is unity Example. Let x be the number of points on the face of a die. What is the sample space of x? Solution. The sample space or set of possible events is xi = {1, 2, 3, 4, 5, 6}. These six outcomes are mutually exclusive. There are many different interpretations of probability because any interpretation that satisfies the rules of probability may be regarded as a kind of probability. An interpretation of probability that is relatively easy to understand is based on symmetry. Addition rule For an actual die, we can estimate the probability a posteriori, that is, by the observation of the outcome of many throws. Suppose that we know that the probability of rolling any face of a die in one throw is equal to 1/6, and we want to find the probability of finding face 3 or face 6 in one throw. the probability of the outcome, i or j , where i is distinct from j P (i or j ) = P (i) + P (j ). (addition rule) The above relation is generalizable to more than two events. An important consequence is that if P (i) is the probability of event i, then the probability of event i not occurring is 1 − P (i). Combining Probabilities •If a given outcome can be reached in two (or more) mutually exclusive ways whose probabilities are pA and pB, then the probability of that outcome is: pA + pB. •This is the probability of having either A or B. Example •Paint two faces of a die red. When the die is thrown, what is the probability of a red face coming up? 1 1 1 p 6 6 3 Example: What is the probability of throwing a three or a six with one throw of a die? Solution. The probability that the face exhibits either 3 or 6 is 1/6 + 1/6 = 1/3 Example: What is the probability of not throwing a six with one throw of die? Solution. The answer is the probability of either “1 or 2 or 3 or 4 or 5.” The addition rule gives that the probability P (not six) is P (not six) = P (1) + P (2) + P (3) + P (4) + P (5) = 1 − P (6) = 5/6 the sum of the probabilities for all outcomes sums to unity. It is very useful to take advantage of this property when solving many probability problems. Multiplication rule • Another simple rule is for the probability of the joint occurrence of independent events. These events might be the probability of throwing a 3 on one die and the probability of throwing a 4 on a second die. If two events are independent, then the probability of both events occurring is the product of their probabilities P (i and j ) = P (i) P (j ) (multiplication rule) • Events are independent if the occurrence of one event does not change the probability for the occurrence of the other. Combining Probabilities •If a given outcome represents the combination of two independent events, whose individual probabilities are pA and pB, then the probability of that outcome is: pA × pB. •This is the probability of having both A and B. Example •Throw two normal dice. What is the probability of two sixes coming up? 1 1 1 p ( 2) 6 6 36 Example: Consider the probability that a person chosen at random is female and was born on September 6. We can reasonably assume equal likelihood of birthdays for all days of the year, and it is correct to conclude that this probability is ½ x 1/365 Being a woman and being born on September 6 are independent events. • Example. What is the probability of throwing an even number with one throw of a die? Solution. We can use the addition rule to find that P (even) = P (2) + P (4) + P (6) = 1/6 + 1/6 +1/6 = ½ • Example. What is the probability of the same face appearing on two successive throws of a die? Solution. We know that the probability of any specific combination of outcomes, for example, (1,1), (2,2), . . . (6,6) is 1/6 x 1/6 = 1/36 P (same face) = P (1, 1) + P (2, 2) + . . . + P (6, 6) = 6 × 1/36 = 1/6 • Example. What is the probability that in two throws of a die at least one six appears? Solution. We have already established that P (6) = 1/6 and P (not 6) = 5/6. In two throws, there are four possible outcomes (6, 6), (6, not 6), (not 6, 6), (not 6, not 6) with the probabilities P (6, 6) = 1/6 x 1/6 = 1/36 P (6, not 6) = P (not 6, 6) = 1/6 x 5/6 = 5/36 P (not 6, not 6) = 5/6 x 5/6 = 25/36 All outcomes except the last have at least one six. Hence, the probability of obtaining at least one six is P (at least one 6) = P (6, 6) + P (6, not 6) + P (not 6, 6) = 1/36 + 5/36 + 5/36 = 11/36 A more direct way of obtaining this result is to use the normalization condition. That is, P (at least one six) = 1 − P (not 6, not 6) = 1 − (5/6)2 = 1 - 25/36 = 11/36 ~ 0.305… Example. What is the probability of obtaining at least one six in four throws of a die? Solution. We know that in one throw of a die, there are two outcomes with P (6) = 1/6 and P (not 6) = 5/6 . Hence, in four throws of a die there are sixteen possible outcomes, only one of which has no six. That is, in the fifteen mutually exclusive outcomes, there is at least one six. We can use the multiplication rule to find that P (not 6, not 6, not 6, not 6) = P (not 6)4 = (5/6)4 and hence P (at least one six) = 1 − P (not 6, not 6, not 6, not 6) = 1 - (5/6)4 = 671/1296 ~ 0.517 Complications p is the probability of success. (1/6 for one die) q is the probability of failure. (5/6 for one die) •p + q = 1, or q=1–p When two dice are thrown, what is the probability of getting only one six? Complications •Probability of the six on the first die and not the second is: 1 5 5 pq 6 6 36 •Probability of the six on the second die and not the first is the same, so: 10 5 p (1) 2 pq 36 18 Simplification •Probability of no sixes coming up is: 5 5 25 p (0) qq 6 6 36 •The sum of all three probabilities is: •p(2) + p(1) + p(0) = 1 Simplification •p(2) + p(1) + p(0) = 1 •p² + 2pq + q² =1 •(p + q)² = 1 The exponent is the number of dice (or tries). •Is this general? Three Dice • (p + q)³ = 1 • p³ + 3p²q + 3pq² + q³ = 1 • p(3) + p(2) + p(1) + p(0) = 1 • It works! It must be general! (p + q)N = 1 Renormalization Suppose we know that P (i) is proportional to f (i), where f (i) is a known function. To obtain the normalized probabilities, we divide each function f (i) by the sum of all the unnormalized probabilities. That is, if P (i) α f (i), and Z = ∑ f (i), then P (i) = f (i)/Z . This procedure is called normalization. • Example. Suppose that in a given class it is three times as likely to receive a C as an A, twice as likely to obtain a B as an A, one-fourth as likely to be assigned a D as an A, and nobody fails the class. What are the probabilities of getting each grade? Solution. We first assign the unnormalized probability of receiving an A as f (A) = 1. Then f (B ) = 2, f (C ) = 3, and f (D) = 0.25. Then Z = ∑ f (i) = 1 + 2 + 3 + 0.25 = 6.25. Hence, P (A) = f (A)/Z = 1/6.25 = 0.16, P (B ) = 2/6.25 = 0.32, P (C ) = 3/6.25 = 0.48, and P (D) = 0.25/6.25 = 0.04. Meaning of Probability • How can we assign the probabilities of the various events? If we say that event E1 is more probable than event E2 (P (E1 ) > P (E2 )), we mean that E1 is more likely to occur than E2 . This statement of our intuitive understanding of probability illustrates that probability is a way of classifying the plausibility of events under conditions of uncertainty. Probability is related to our degree of belief in the occurrence of an event. • Probability assessments depend on who does the evaluation and the status of the information the evaluator has at the moment of the assessment. We always evaluate the conditional probability, that is, the probability of an event E given the information I , P (E | I ). Consequently, several people can have simultaneously different degrees of belief about the same event, as is well known to investors in the stock market. IHSG If rational people have access to the same information, they should come to the same conclusion about the probability of an event. The idea of a coherent bet forces us to make probability assessments that correspond to our belief in the occurrence of an event. Probability assessments should be kept separate from decision issues. Decisions depend not only on the probability of the event, but also on the subjective importance of say, a given amount of money Probability and Knowledge • Probability as a measure of the degree of belief in the occurrence of an outcome implies that probability depends on our prior knowledge, because belief depends on prior knowledge. • Probability depends on what knowledge we bring to the problem. If we have no knowledge other than the possible outcomes, then the best estimate is to assume equal probability for all events. However, this assumption is not a definition, but an example of belief. As an example of the importance of prior knowledge, consider the following problem. Large numbers We can estimate probabilities empirically by sampling, that is, by making repeated measurements of the outcome of independent events. Intuitively we believe that if we perform more and more measurements, the calculated average will approach the exact mean of the quantity of interest. We should use computer to generate random number. The applet/application at <stp.clarku.edu/simulations/cointoss> to simulate multiple tosses of a single coin This idea is called the law of large numbers. Mean Value • Consider the probability distribution P (1), P (2), . . . P (n) for the n possible values of the variable x. In many cases it is more convenient to describe the distribution of the possible values of x in a less detailed way. The most familiar way is to specify the average or mean value of x, which we will denote as <x>. The definition of the mean value of <x> is <x> ≡ x1 P (1) + x2 P (2) + . . . + xnP (n) where P (i) is the probability of xi . If f (x) is a function of x, then the mean value of f (x) is defined by Example: A certain $50 or $100 if you flip a coin and get a head and $0 if you get a tail. The mean value for the second choice is mean value = ∑ Pi × (value of i), where the sum is over the possible outcomes and Pi is the probability of outcome i. In this case the mean value is 1/2 × $100 + 1/2 × $0 = $50. We see that the two choices have the same mean value. (Most people prefer the first choice because the outcome is “certain.”) If f (x) and g(x) are any two functions of x, then <f (x) + g(x)> = ∑ [f (xi) + g(xi )] P (i) = ∑ f (xi) P (i) + ∑ g(xi) P (i) or <f (x) + g (x)> = <f (x)> + <g (x)> if c is a constant, then <c f (x)> = c <f (x)> In general, we can define the mth moment of the probability distribution P as <xm> ≡ ∑ xim P (i) where we have let f (x) = xm . The mean of x is the first moment of the probability distribution The mean value of x is a measure of the central value of x about which the various values of xi are distributed. If we measure x from its mean, we have that Δx ≡ x − <x> <Δx> = <(x − <x>)> = <x> − <x> = 0 That is, the average value of the deviation of x from its mean vanishes If only one outcome j were possible, we would have P (i) = 1 for i = j and zero otherwise, that is, the probability distribution would have zero width. In general, there is more than one outcome and a possible measure of the width of the probability distribution is given by <Δx2> ≡ <(x − <x>)2> The quantity <Δx2> is known as the dispersion or variance and its square root is called the standard deviation. It is easy to see that the larger the spread of values of x about <x>, the larger the variance. The use of the square of x − <x> ensures that the contribution of x values that are smaller and larger than <x> enter with the same sign. A useful form for the variance can be found by noting that <(x − <x>)2> = <(x2 − 2x<x> + <x>2)> = <x2> - 2 <x><x> + <x>2 = <x2> - <x>2 Because <Δx2> is always nonnegative, it follows that <x2> ≥ <x>2 it is useful to interpret the width of the probability distribution in terms of the standard deviation σ, which is defined as the square root of the variance. The standard deviation of the probability distribution P (x) is given by σx = square (<Δx2>) = square (<x2> - <x>2) Example: Find the mean value <x>, the variance <Δx2>, and the standard deviation σx for the value of a single throw of a die. Solution. Because P (i) = 1/6 for i = 1, . . . , 6, we have that <x> = 1/6 (1+2+3+4+5+6) = 7/2 <x2> = 1/6 (1 + 4 + 9 + 25 + 36) = 46/3 (<Δx2>) = <x2> - <x>2 = 46/3 – 49/4 = 37/12 ~ 3.08 σx = square (3.08) ~ 1.76 Home work There is an one-dimensional lattice constant a as shown in Fig. 1. An atom transit from a site to a nearest-neighbor site every r second. The probability of transiting to the right and left are p and q = 1 – p, respectively. (a) Calculate the average position <x> of the atom at the time t = N τ, where N >> 1 (b) Calculate the mean square value <(x - <x>)2> at the time t Ensemble Another way of estimating the probability is to perform a single measurement on many copies or replicas of the system of interest. For example, instead of flipping a single coin 100 times in succession, we collect 100 coins and flip all of them at the same time. The fraction of coins that show heads is an estimate of the probability of that event. The collection of identically prepared systems is called an ensemble and the probability of occurrence of a single event is estimated with respect to this ensemble. The ensemble consists of a large number M of identical systems, that is, systems that satisfy the same known conditions. Information and Uncertainty Let us define the uncertainty function S (P1 , P2 , . . . , Pi , . . .) where Pi is the probability of event i. In case where all the probabilities Pi are equal. Then, P1 = P2 = . . . = Pi = 1/Ω, where Ω is the total number of outcomes. In this case we have S = S (1/Ω, 1/Ω, . . .) or simply S (Ω). For only one outcome, Ω = 1 and there is no uncertainty, S (Ω = 1) = 0 and S (Ω1 ) > S (Ω2 ) if Ω1 > Ω2 That is, S (Ω) is a increasing function of Ω We next consider multiple events. For example, suppose that we throw a die with Ω1 outcomes and flip a coin with Ω2 equally probable outcomes. The total number of outcomes is Ω = Ω1 Ω2 . If the result of the die is known, the uncertainty associated with the die is reduced to zero, but there still is uncertainty associated with the toss of the coin. Similarly, we can reduce the uncertainty in the reverse order, but the total uncertainty is still nonzero. These considerations suggest that S (Ω1 Ω2 ) = S (Ω1 ) + S (Ω2 ) or S (xy) = S (x) + S (y) This generalization is consistent with S (Ω) being a increasing function of Ω First we take the partial derivative of S (xy) with respect to x and then with respect to y. We let z = xy and obtain From, S (xy) = S (x) + S (y) By comparing the right-hand side If we multiply the first by x and the second by y, we obtain The first term depends only on x and the second term depends only on y. Because x and y are independent variables, the three terms must be equal to a constant. Hence we have the desired condition where A is a constant. It can be integrated to give The integration constant B must be equal to zero to satisfy the condition S (Ω = 1) = 0 The constant A is arbitrary so we choose A = 1. Hence for equal probabilities we have that S (Ω) = ln Ω. In case where the probabilities for the various events are unequal? The general form of the uncertainty S is Note that if all the probabilities are equal, then Pi = 1 / Ω, for all i. In this case We also see that if outcome j is certain, Pj = 1 and Pi = 0 if i = j and S = −1 ln 1 = 0. That is, if the outcome is certain, the uncertainty is zero and there is no missing information. We have shown that if the Pi are known, then the uncertainty or missing information S can be calculated. Usually the problem is to determine the probabilities. Suppose we flip a perfect coin for which there are two possibilities. We know intuitively that P1 (heads) = P2 (tails) = 1/2. That is, we would not assign a different probability to each outcome unless we had information to justify it. Intuitively we have adopted the principle of least bias or maximum uncertainty. Lets reconsider the toss of a coin. In this case S is given by where we have used the fact that P1 + P2 = 1. To maximize S we take the derivative with respect to P1. Use d(ln x)/dx = 1/x The solution satisfies which is satisfied by P1 = 1/2. We can check that this solution is a maximum by calculating the second derivative. which is less than zero as expected for a maximum. Example. The toss of a three-sided die yields events E1 , E2 , and E3 with a face of one, two, and three points. As a result of tossing many dice, we learn that the mean number of points is f = 1.9, but we do not know the individual probabilities. What are the values of P1 , P2 , and P3 that maximize the uncertainty? Solution. We have, S = − [ P1 ln P1 + P2 ln P2 + P3 ln P3 ] We also know that, f = 1P1 + 2P2 + 3P3 , and P1 + P2 + P3 = 1. We use the latter condition to eliminate P3 using P3 = 1 − P1 − P2 , and rewrite the above as f = P1 + 2P2 + 3(1 − P1 − P2 ) = 3 − 2P1 − P2 . We then use this to eliminate P2 and P3 from the first eq. using P2 = 3 − f − 2P1 and P3 = f − 2 + P1, then S = −[P1 ln P1 + (3 − f − 2P1 ) ln(3 − f − 2P1 ) + (f − 2 + P1 ) ln(f − 2 + P1 )]. Because S depends on only P1 , we can differentiate S with respect to P1 to find its maximum value: Microstates and Macrostates Each possible outcome is called a “microstate”. The combination of all microstates that give the same number of spots is called a “macrostate”. The macrostate that contains the most microstates is the most probable to occur. i Microstates and Macrostates The evolution of a system can be represented by a trajectory in the multidimensional (configuration, phase) space of micro1 parameters. Each point in this space represents a microstate. During its evolution, the system will only pass through accessible microstates – the ones that do not violate the conservation laws: e.g., for an isolated system, the total internal energy must be conserved. 2 Microstate: the state of a system specified by describing the quantum state of each molecule in the system. For a classical particle – 6 parameters (xi, yi, zi, pxi, pyi, pzi), for a macro system – 6N parameters. The statistical approach: to connect the macroscopic observables (averages) to the probability for a certain microstate to appear along the system’s trajectory in configuration space, P( 1, 2,..., N). Macrostate: the state of a macro system specified by its macroscopic parameters. Two systems with the same values of macroscopic parameters are thermodynamically indistinguishable. A macrostate tells us nothing about a state of an individual particle. For a given set of constraints (conservation laws), a system can be in many macrostates. The Phase Space vs. the Space of Macroparameters some macrostate P numerous microstates in a multi-dimensional configuration (phase) space that correspond the same macrostate T V the surface defined by an equation of states i i 1 2 i i 1 1 2 2 1 etc., etc., etc. ... 2 Examples: Two-Dimensional Configuration Space motion of a particle in a one-dimensional box K=K0 L -L 0 K “Macrostates” are characterized by a single parameter: the kinetic energy K0 Another example: one-dimensional harmonic oscillator U(r) K + U =const px -L x L x px -px x Each “macrostate” corresponds to a continuum of microstates, which are characterized by specifying the position and momentum The Fundamental Assumption of Statistical Mechanics i 1 2 The ergodic hypothesis: an isolated system in an equilibrium state, evolving in time, will pass through all the accessible microstates at the same recurrence rate, i.e. all accessible microstates are equally probable. microstates which correspond to the same energy The ensemble of all equi-energetic states a microcanonical ensemble. Note that the assumption that a system is isolated is important. If a system is coupled to a heat reservoir and is able to exchange energy, in order to replace the system’s trajectory by an ensemble, we must determine the relative occurrence of states with different energies. For example, an ensemble whose states’ recurrence rate is given by their Boltzmann factor (e-E/kBT) is called a canonical ensemble. The average over long times will equal the average over the ensemble of all equienergetic microstates: if we take a snapshot of a system with N microstates, we will find the system in any of these microstates with the same probability. Probability for a stationary system many identical measurements on a single system a single measurement on many copies of the system Probability of a Macrostate, Multiplicity Probability of a particular microstate of a microcanon ical ensemble 1 # of all accessible microstate s The probability of a certain macrostate is determined by how many microstates correspond to this macrostate – the multiplicity of a given macrostate . Probability of a particular macrostate # of microstate s that correspond to a given macrostate # of all accessible microstate s This approach will help us to understand why some of the macrostates are more probable than the other, and, eventually, by considering the interacting systems, we will understand irreversibility of processes in macroscopic systems. Probability “Probability theory is nothing but common sense reduced to calculations” Laplace (1819) An event (very loosely defined) – any possible outcome of some measurement. An event is a statistical (random) quantity if the probability of its occurrence, P, in the process of measurement is < 1. The “sum” of two events: in the process of measurement, we observe either one of the events. Addition rule for independent events: P (i or j) = P (i) + P (j) (independent events – one event does not change the probability for the occurrence of the other). The “product” of two events: in the process of measurement, we observe both events. Multiplication rule for independent events: Example: P (i and j) = P (i) x P (j) What is the probability of the same face appearing on two successive throws of a dice? The probability of any specific combination, e.g., (1,1): 1/6x1/6=1/36 (multiplication rule) . Hence, by addition rule, P(same face) = P(1,1) + P(2,2) +...+ P(6,6) = 6x1/36 = 1/6 a macroscopic observable A: (averaged over all accessible microstates) A P 1 ,..., N A 1 ,..., N Two Interacting Einstein Solids, Macropartitions Suppose that we bring two Einstein solids A and B (two sub-systems with NA, UA and NB, UB) into thermal contact, to form a larger isolated system. What happens to these solids (macroscopically) after they have been brought into contact? The combined system – N = NA+ NB , U = UA + UB energy NA, UA NB, UB Macropartition: a given pair of macrostates for sub-systems A and B that are consistent with conservation of the total energy U = UA + UB. Different macropartitions amount to different ways that the energy can be macroscopically divided between the sub-systems. Example: the pair of macrostates where UA= 2 and UB= 4 is one possible macropartition of the combined system with U = 6 As time passes, the system of two solids will randomly shift between different microstates consistent with the constraint that U = const. Question: what would be the most probable macropartition for given NA, NB , and U ? Problem: Consider the system consisting of two Einstein solids in thermal contact. A certain macropartition has a multiplicity of 6101024, while the total number of microstates available to the system in all macropartitions is 3101034. What is the probability to find the system in this macropartition? Imagine that the system is initially in the macropartition with a multiplicity of 6101024. Consider another macropartition of the same system with a multiplicity of 6101026. If we look at the system a short time later, how many more times likely is it to have moved to the second macropartition than to have stayed with the first? The Multiplicity of Two Sub-Systems Combined The probability of a macropartition is proportional to its multiplicity: AB A B macropartition A+B sub-system A sub-system B Example: two one-atom “solids” into thermal contact, with the total U = 6. Possible macropartitions for NA= NB = 3, U = qA+qB= 6 Macropartition UA UB A B AB 0:6 0 6 1 28 28 1:5 1 5 3 21 63 2:4 2 4 6 15 90 3:3 3 3 10 10 100 4:2 4 2 15 6 90 5:1 5 1 21 3 63 6:0 6 0 28 1 28 Grand total # of microstates: U / N 1 ! 6 6 1 ! 462 U / !( N 1) ! 6!(6 1) ! Where is the Maximum? The Average Energy per Atom Let’s explore how the macropartition multiplicity for two sub-systems A and B (NA, NB, A= B= ) in thermal contact depends on the energy of one of the sub-systems: eU A A ( N A , U A ) NA , N eU A A ( N A , U A ) B ( N B , U B ) N A The high-T limit (q >> N): AB NA eU d AB N A A dU A N A N A 1 e N A eU U A N B NB eU N B A N A A NA eU B B ( N B , U B ) NB N eU U A N B NB U B U U A B eU U A N B N B 1 e 0 N B UA UB NA NB For two systems in thermal contact, the equilibrium (most probable) macropartition of the combined system is the one where the average energy per atom in each system is the same (the basis for introducing the temperature). For two identical sub-systems (NA = NB), AB(UA) is peaked at UA= UB= ½ U : A B AB x U/2 UA = U/2 UA U/2 UA At home: find the position of the maximum of AB(UA) for NA = 200, NB = 100, U = 180 AB Sharpness of the Multiplicity Function How sharp is the peak? Let’s consider small deviations from the maximum for two identical sub-systems: 2U U/2 UA= (U/2) (1+x) UA AB U A U B N Example: N = 100,000 x = 0.01 More rigorously (p. 65): AB e N 2N N U 2 2N (x <<1) 1 x N 1 x N eqAB 1 x 2 N (0.9999)100,000 ~ 4.5·10-5 1 U A U B N UB= (U/2) (1-x) N U / 2 U / 2 N N 2 U exp N U / 2 a Gaussian function U N 1 U / 2 2 The peak width: U 1 U /2 N When the system becomes large, the probability as a function of UA (macropartition) becomes very sharply peaked. Problem: Consider the system consisting of two Einstein solids P and Q in thermal equilibrium. Assume that we know the number of atoms in each solid and . What do we know if we also know (a) the quantum state of each atom in each solid? (b) the total energy of each of the two solids? (c) the total energy of the combined system? the system’s macrostate the system’s microstate X the system’s macropartition (a) X X (b) X X (c) X X (fluctuations) Implications? Irreversibility! The vast majority of microstates are in macropartitions close to the most probable one (in other words, because of the “narrowness” of the macropartition probability graph). Thus, (a) If the system is not in the most probable macropartition, it will rapidly and inevitably move toward that macropartition. The reason for this “directionality” (irreversibility): there are far more microstates in that direction than away. This is why energy flows from “hot” to “cold” and not vice versa. (b) It will subsequently stay at that macropartition (or very near to it), in spite of the random shuffling of energy back and forth between the two solids. When two macroscopic solids are in thermal equilibrium with each other, completely random and reversible microscopic processes (leading to random shuffling between microstates) tend at the macroscopic level to push the solids inevitably toward an equilibrium macropartition (an irreversible macro behavior). Any random fluctuations away from the most likely macropartition are extremely small ! Problem: Imagine that you discover a strange substance whose multiplicity is always 1, no matter how much energy you put into it. If you put an object made of this substance (sub-system A) into thermal contact with an Einstein solid having the same number of atoms but much more energy (sub-system B), what will happen to the energies of these sub-systems? A. Energy flows from B to A until they have the same energy. B. Energy flows from A to B until A has no energy. C. No energy will flow from B to A at all. Two model systems with fixed positions of particles and discrete energy levels - the models are attractive because they can be described in terms of discrete microstates which can be easily counted (for a continuum of microstates, as in the example with a freely moving particle, we still need to learn how to do this). This simplifies calculation of . On the other hand, the results will be applicable to many other, more complicated models. Despite the simplicity of the models, they describe a number of experimental systems in a surprisingly precise manner. - two-state paramagnet (“limited” energy spectrum) - the Einstein model of a solid (“unlimited” energy spectrum) .... The Two-State Paramagnet - a system of non-interacting magnetic dipoles in an external magnetic field B, each dipole can have only two possible orientations along the field, either parallel or any-parallel to this axis (e.g., a particle with spin ½ ). No “quadratic” degrees of freedom (unlike in an ideal gas, where the kinetic energies of molecules are unlimited), the energy spectrum of the particles is confined within a finite interval of E (just two allowed energy levels). B A particular microstate (....) is specified if the directions of all spins are specified. A macrostate is specified by the total # of dipoles that point “up”, N (the # of dipoles that point “down”, N = N - N ). E E2 = + B an arbitrary choice of zero energy 0 N N N E1 = - B N - the number of “up” spins N - the number of “down” spins - the magnetic moment of an individual dipole (spin) The total magnetic moment: (a macroscopic observable) M N N N N N 2 N N The energy of a single dipole in the external magnetic field: The energy of a macrostate: i i B - B for parallel to B, +B for anti-parallel to B U M B B N N B N 2N Example Consider two spins. There are four possible configurations of microstates: M= 2 0 0 - 2 In zero field, all these microstates have the same energy (degeneracy). Note that the two microstates with M=0 have the same energy even when B0: they belong to the same macrostate, which has multiplicity =2. The macrostates can be classified by their moment M and multiplicity : M= 2 0 - 2 = 1 2 1 For three spins: M= macrostates: 3 - M= 3 - -3 = 1 3 3 1 - - -3 The Multiplicity of Two-State Paramagnet Each of the microstates is characterized by N numbers, the number of equally probable microstates – 2N, the probability to be in a particular microstate – 1/2N. For a two-state paramagnet in zero field, the energy of all macrostates is the same (0). A macrostate is specified by (N, N). Its multiplicity - the number of ways of choosing N objects out of N : ( N ,0) 1 ( N ,1) N ( N ,2) N N 1 2 N N 1 ... N n 1 N! ( N , n) n ... 3 2 1 n !n 1! N N 1 N 2 ( N ,3) 3 2 n ! n factorial = 1·2·....·n 0 ! 1 (exactly one way to arrange zero objects) The multiplicity of a macrostate of a two-state paramagnet with (N, N): ( N , N ) N! N! N! N! N! ( N N )! The Probability of Macrostates of a Two-State PM (B=0) P( N , N ) ( N , N ) ( N , N ) ( N , N ) # of all microstate s ( N , all N ) 2N N! N N e N P( N , N ) N N N N N N N N N !N N !2 N e N N e 2 NN N N N N N N 2 N - as the system becomes larger, the P(N,N) graph becomes more sharply peaked: N =1 (1,N) =1, 2N=2, P(1,N)=0.5 P(1, N) P(15, N) P(1023, N) 0.5 0 1 N N 0 - random orientation of spins in B=0 is overwhelmingly more probable 0.5·1023 (http://stat-www.berkeley.edu/~stark/Java/Html/) 1023 Nn Bernoulli Processes and the Binomial Distribution Because most physicists spend little time gambling, we will have to develop our intuitive understanding of probability in other ways. Our strategy will be to first consider some physical systems, e.g magnetic moment or spin, for which we can calculate the probability distribution by analytical methods. Then we will use the computer to generate more data to analyze. • Consider a system of N noninteracting magnetic dipoles each having a magnetic moment µ and associated spin in an external magnetic field B . The field B is in the up (+z) direction. According to quantum mechanics the component of the magnetic dipole moment along a given axis is limited to certain discrete values. Spin 1/2 implies that a spin can either point up (parallel to B ) or down (antiparallel to B ). The energy of interaction of each spin with the magnetic field is E = −µB if the spin is up and +µB if the spin is down. This model is a simplification of more realistic magnetic systems. Take p to be the probability that the spin (magnetic moment) is up and q the probability that the spin is down. Because there are no other possible outcomes, we have p +q = 1 or q = 1 −p. If B = 0, there is no preferred spatial direction and p = q = 1/2. For B = 0 we do not yet know how to calculate p and for now we will assume that p is a known parameter. We associate with each spin a random variable si which has the values ±1 with probability p and q, respectively. One of the quantities of interest is the magnetization M , which is the net magnetic moment of the system. For a system of N spins the magnetization is given by M = µ(s1 + s2 + . . . + sN ) = µ ∑ si take µ = 1 first calculate the mean value of M , then its variance, and finally the probability distribution P (M ) that the system has magnetization M . To compute the mean value of M , we need to take the mean values of both sides <M> = <∑ si> = ∑ < si> Because the probability that any spin has the value ±1 is the same for each spin, the mean value of each spin is the same, that is, <s1> = <s2> = . . . = <sN> ≡ <s>. Therefore the sum consists of N equal terms and can be written as <M> = N<s> The meaning of above equation is that the mean magnetization is N times the mean magnetization of a single spin. Because <s> = (1 × p) + (−1 × q) = p − q, we have that <M> = N (p − q) Let us calculate the variance of M , that is, <(M − <M>)2> . We write ΔM = M − <M> = ∑ Δ< si> Where Δsi ≡ si − <s> • Example: let us calculate <(ΔM)2> for N = 3 spins • Solution: (ΔM)2 = (Δs1 + Δs2 + Δs3 )(Δs1 + Δs2 + Δs3 ) = [(Δs1)2 + (Δs2)2 + (Δs3)2] + 2[Δs1Δs2 + Δs1 Δs3 + Δs2Δs3 ] take the mean value, interchange the order of the sums and averages, and write <(ΔM)2> = [<(Δs1)2> + <(Δs2)2> + <(Δs3)2>] + 2[<Δs1Δs2> + <Δs1Δs3> + <Δs2Δs3>] Then <Δsi Δsj> = <Δsi><Δsj> = 0,(i ≠ j ) because < Δsi> = 0, Because different spins are statistically independent (the spins do not interact), each cross term vanishes on the average. Then, <(ΔM)2> = [<(Δs1)2> + <(Δs2)2> + <(Δs3)2>] Because each spin is equivalent on the average, each term is equal. Hence, we obtain the desired result <(ΔM)2> = 3 <(Δs1)2> The variance of M is 3 times the variance of a single spin, that is, the variance is additive. We can evaluate <(ΔM)2> further by finding an explicit expression for <(Δs)2> . We have that <s2> = [12 × p] + [(−1)2 × q] = p + q = 1. Hence, we have <(Δs)2>= <s2>−<s>2 =1−(p − q)2=1−(2p−1)2 = 1 − 4p2 + 4p − 1 = 4p(1 − p) = 4pq <(ΔM)2> = 3 (4pq) for N non interacting spins <(ΔM)2> = N (4pq) Because of the simplicity of a system of non interacting spins, we can calculate the probability distribution itself. Let us consider the statistical properties of a system of N = 3 non interacting spins. Because each spin can be in one of two states, there are 2N=3 = 8 distinct outcomes. Because each spin is independent of the other spins, we can use the multiplication rule to calculate the probabilities of each outcome. Although each outcome is distinct, several of the configurations have the same number of up spins. One quantity of interest is the probability PN (n) that n spins are up out a total of N spins. For example, there are three states with n = 2, each with probability p2 q so the probability that two spins are up is equal to 3p2 q. For N = 3 we see from Figure P3 (n = 3) = p3 P3 (n = 2) = 3p2 q P3 (n = 1) = 3pq2 P3 (n = 0) = q3 N=4 … N= 6 … N= n … Example: Find the first two moments of P3 (n) Solution. The first moment n of the distribution is given by <n> = 0 × q3 + 1 × 3pq2 + 2 × 3p2 q + 3 × p3 = 3p (q2 + 2pq + p2 ) = 3p (q + p)2 = 3p Similarly, the second moment <n2> of the distribution is given by <n2> = 0 × q3 + 12 × 3pq2 + 22 × 3p2 q + 32 × p3 = 3p (q2 + 4pq + 3p2 ) = 3p(q + 3p)(q + p) = 3p (q + 3p) = (3p)2 + 3pq Hence <(n − <n>)2> = <n2> − <n>2 = 3pq First, in each trial there are only two outcomes, for example, up or down, heads or tails, and right or left. Second, the result of each trial is independent of all previous trials, for example, the drunken sailor has no memory of his or her previous steps. This type of process is called a Bernoulli process (after the mathematician Jacob Bernoulli, 16541705) • Because of the importance of magnetic systems, we will cast our discussion of Bernoulli processes in terms of the non interacting magnetic moments of spin 2 . • The main quantity of interest is the probability PN(n) which we now calculate for arbitrary N and n. We know that a particular outcome with n up spins and n′ down spins occurs with probability pnqn′ . We write the probability PN (n) as PN (n) = WN (n, n′ ) s pnqn’ where n′ = N − n and WN (n, n′ ) is the number of distinct configurations of N spins with n up spins and n′ down spins. From our discussion of N = 3 non interacting spins, we already know the first several values of WN (n, n′ ). • We can determine the general form of WN (n, n′ ) by obtaining a recursion relation between WN and W N −1 . A total of n up spins and n′ down spins out of N total spins can be found by adding one spin to N − 1 spins. The additional spin is either (a) up if there are (n − 1) up spins and n′ down spins, or (b) down if there are n up spins and (n′ − 1) down spins. • Because there are WN (n − 1, n′ ) ways of reaching the first case and WN (n, n′ − 1) ways in the second case, we obtain the recursion relation WN (n, n′ ) = W N −1 (n − 1, n′ ) + W N −1 (n, n′ − 1). • If we begin with the known values W0 (0, 0) = 1, W1 (1, 0) = W1 (0, 1) = 1, we can use the recursion relation to construct WN (n, n′ ) for any desired N . For example, W2 (2, 0) = W1 (1, 0) + W1 (2, −1) = 1 + 0 = 1 W2 (1, 1) = W1 (0, 1) + W1 (1, 0) = 1 + 1 = 2 W2 (0, 2) = W1 (−1, 2) + W1 (0, 1) = 0 + 1 It shows that WN (n, n′ ) forms a pyramid or (a Pascal) triangle. The values of the first few coefficients WN (n, n′ ). Each number is the sum of the two numbers to the left and right above it. This construction is called a Pascal triangle. It is straightforward to show by induction that the expression Note the convention 0! = 1 Binomial distribution Binomial Distribution •Probability of n successes in N attempts •(p + q)N = 1 N! n N n P ( n) p q n!( N n)! •where, q = 1 – p. Note that for p = q = 1/2, PN (n) reduces to The binomial distribution P16 (n) for p = q = 1/2 and N = 16 Thermodynamic Probability • The term with all the factorials in the previous equation is the number of microstates that will lead to the particular macrostate. It is called the “thermodynamic probability”, wn. N! wn n!( N n)! Microstates • The total number of microstates is: w wn True probabilit y P(n) For a very large number of particles wmax Mean of Binomial Distribution n P ( n) n n where N! n N n P ( n) p q n!( N n)! Notice : p P (n) P(n)n p Mean of Binomial Distribution n P ( n) n p P ( n) p n n N n p P ( n) p ( p q ) p n p n pN ( p q ) N 1 n pN pN (1) N 1 Standard Deviation () n n 2 n n P(n)n n 2 2 2 n n n 2 n 2n n n n 2n n n 2 2 n n 2 2 2 2 2 Standard Deviation 2 n P ( n) n p P ( n) n p n 2 N N 1 n p p ( p q ) p pN ( p q ) p p p 2 n 2 pN ( p q ) 2 N 1 pN ( N 1)( p q ) n 2 pN 1 pN p pN q pN N 2 Standard Deviation n n 2 2 2 2 pN q pN ( pN ) 2 2 Npq ( pN ) 2 ( pN ) 2 Npq Npq For a Binomial Distribution n pN Npq q n Np Coins • Toss 6 coins. Probability of n heads: N! 6! 1 n N n P ( n) pq n!( N n)! n!(6 n)! 2 6! 1 P ( n) n!(6 n)! 2 6 n 1 2 6 n For Six Coins Binomial Distribution 0.35 0.3 Probabilty 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 Successes 4 5 6 For 100 Coins Binomial Distribution 0.09 0.08 0.06 0.05 0.04 0.03 0.02 0.01 Successes 96 90 84 78 72 66 60 54 48 42 36 30 24 18 12 6 0 0 Probabilty 0.07 For 1000 Coins Binomial Distribution 0.03 0.02 0.015 0.01 0.005 Successes 960 900 840 780 720 660 600 540 480 420 360 300 240 180 120 60 0 0 Probabilty 0.025 Math required to bridge the gap between 1 and 1023 Typically, N is huge for macroscopic systems, and the multiplicity is unmanageably large – for an Einstein solid with 1023 atoms, ~ 10 10 23 One of the ways to deal with these numbers – to take their logarithm [ in fact, the entropy S of the macrostate kB ln of the macrostate ] thus, we need to learn how to deal with logarithms of huge numbers. e ln x x ln xy ln x ln y ln x / y ln x ln y ln x y ln x y e 10 x x / ln10 ~ 10 0.43 x Stirling’s Approximation for N! (N>>1) Multiplicity depends on N!, and we need an approximation for ln(N!): N lnN! ln1 ln2 ln3 ··· lnN ln x dx x ln x x 1 N ln N N N 1 ln N! N ln N N N N ! e i More accurately: N! N e N Check: or N N N 2N e N 2N 1 1 ln N ! N ln N N ln N ln 2 N ln N N 2 2 because ln N << N for large N Stirling’s Approximation For large N : ln N ! N ln N N N! ln N ! ln N i ! ln N ! ln N i ! ln w ln i Ni! ln w N ln N N ( N i ln N i ) N i i i ln w N ln N ( N i ln N i ) i Multiple outcomes for large N N! N! w N1! N 2! N 3! N i ! N i i N Number Expected • Toss 6 coins N times. Probability of n heads: N! 6! 1 n N n P ( n) pq n!( N n)! n!(6 n)! 2 6! 1 P ( n) n!(6 n)! 2 n 1 2 6 • Number of times n heads is expected is: • n = N P(n) 6 n The Gaussian Distribution as a Limit of the Binomial Distribution for N>>1, PN (n) is a rapidly varying function of n near n = pN , and for this reason we do not want to approximate PN (n) directly. Because the logarithm of PN (n) is a slowly varying function, we expect that the power series expansion of ln PN (n) will converge. Hence, we expand ln PN (n) in a Taylor series about the value of n = ñ at which ln PN (n) reaches its maximum value. We will write p(n) instead of PN (n) because we will treat n as a continuous variable and hence p(n) is a probability density. We find • Because we have assumed that the expansion is about the maximum n = ñ, the first derivative d ln p(n)/dn|n= ñ must be zero. For the same reason the second derivative must be negative. We assume that the higher terms can be neglected and adopt the notation ln A = ln p(n = ñ ) and Then We next use Stirling’s approximation to evaluate the first two derivatives of ln p(n) and the value of ln p(n) at its maximum to find the parameters A, B , and ñ From binomial distribution Take log. To obtain ln p(n) = ln N ! − ln n! − ln(N − n)! + n ln p + (N − n) ln q Use relation We have, d(ln p(n))/dn = − ln n + ln(N − n) + ln p − ln q The most probable value of n is found by finding the value of n that satisfies the condition d ln p/dn = 0. We find or (N − ñ )p = ñq. If we use the relation p + q = 1, we obtain ñ = pN Note that ñ = n, that is, the value of n for which p(n) is a maximum is also the mean value of n. See, d(ln p(n))/dn = − ln n + ln(N − n) + ln p − ln q The second derivative Then We use or B = 1/σ2 , where σ2 is the variance of n. (Gaussian probability density) • Gaussian probability density is valid for large values of N and for values of n near <n>. Even for relatively small values of N , the Gaussian approximation is a good approximation for most values of n. • The most important feature of the Gaussian probability distribution is that its relative width, σn /<n>, decreases as N−1/2 . The Poisson distribution and Should You Fly in Air-planes? • We now return to the question of whether or not it is safe to fly. If the probability of a plane crashing is p = 10−5 , then 1 − p is the probability of surviving a single flight. The probability of surviving N flights is then PN = (1 − p)N . For N = 400, PN ≈ 0.996, and for N = 105 , PN ≈ 0.365. Thus, our intuition is verified that if we took 400 flights, we would have only a small chance of crashing. This type of reasoning is typical when the probability of an individual event is small, but there are very many attempts. Suppose we are interested in the probability of the occurrence of n events out of N attempts given that the probability p of the event for each attempt is very small. The resulting probability is called the Poisson distribution, a distribution that is important in the analysis of experimental data. To derive the Poisson distribution, we begin with the binomial distribution: We first use Stirling’s approximation to write For p << 1, we have ln(1 − p) ≈ −p, eln(1−p) = 1 − p ≈ e−p , and (1 − p)N −n ≈ e−p(N −n) ≈ e−pN . If we use the above approximations, we find With <n> = pN Poisson distribution Let us apply the Poisson distribution to the airplane survival problem. We want to know the probability of never crashing, that is, P (n = 0). The mean <N> = pN equals 10−5 × 400 = 0.004 for N = 400 flights and <N> = 1 for N = 105 flights. Thus, the survival probability is P (0) = e−<N> ≈ 0.996 for N = 400 and P (0) ≈ 0.368 for N = 105 as we calculated previously. We see that if we fly 100,000 times, we have a much larger probability of dying in a plane crash. Traffic Flow and the Exponential Distribution The Poisson distribution is closely related to the exponential distribution as we will see in the following. Consider a sequence of similar random events and let t1 , t2 , . . . be the time at which each successive event occurs. Examples of such sequences are the successive times when a phone call is received and the times when a Geiger counter registers a decay of a radioactive nucleus. Suppose that we determine the sequence over a very long time T that is much greater than any of the intervals ti − ti−1 . We also suppose that the average number of events is λ per unit time so that in a time interval t, the mean number of events is λt. Assume that the events occur at random and are independent of each other. Given λ, the mean number of events per unit time, we wish to find the probability distribution w(t) of the interval t between the events. We know that if an event occurred at time t = 0, the probability that another event occurs within the interval [0, t] is and the probability that no event occurs in the interval t is Thus the probability that the duration of the interval between the two events is between t and t + Δt is given by • w(t)Δt = probability that no event occurs in the interval [0, t] × probability that an event occurs in interval [t, t + Δt] If we cancel Δt from each side and differentiate both sides with respect to t, we find The constant of integration A is determined from the normalization condition: Hence, w(t) is the exponential function These results for the exponential distribution lead naturally to the Poisson distribution. Let us divide a long time interval T into n smaller intervals t = T /n. What is the probability that 0, 1, 2, 3, . . . events occur in the time interval t, given λ, the mean number of events per unit time? We will show that the probability that n events occur in the time interval t is given by the Poisson distribution: We first consider the case n = 0. If n = 0, the probability that no event occurs in the interval t is For the case n = 1, there is exactly one event in time interval t. This event must occur at some time t′ which may occur with equal probability in the interval [0, t]. Because no event can occur in the interval [t′ , t], we have with t → (t − t′ ). Hence, In general, if n events are to occur in the interval [0, t], the first must occur at some time t′ and exactly (n − 1) must occur in the time (t − t′ ). Hence, The above equation is a recurrence formula Simulation 1. Approach to equilibrium (http://stp.clarku.edu/simulations/approachtoequilibrium/index.h tml 2. Sensitivity to initial conditions (http://stp.clarku.edu/simulations/sensitive/index.html) 3. Random walks (http://stp.clarku.edu/simulations/randomwalks/index.html) 4. Multiple coin toss (http://stp.clarku.edu/simulations/cointoss/index.html) 5. The Binomial distribution (http://stp.clarku.edu/simulations/binomial/index.html) 6. Monte Carlo estimation (http://stp.clarku.edu/simulations/estimate/index.html) 7. Random multiplicative processes (http://stp.clarku.edu/simulations/multiplicativeprocess/index.ht ml) Kuis Probabilitas, 31 Agustus 2010, 30 menit 1. Berapakah probabilitas kemunculan pelemparan dadu sekali untuk angka 3 atau 6? 2. Berapakah probabilitas kemunculan pelemparan dadu sekali untuk angka bukan 6? 3. Berapakah probabilitas kemunculan pelemparan dadu sekali untuk angka genap ? 4. Berapakah probabilitas kemunculan pelemparan dadu dua kali untuk paling tidak satu angka 6? Next … Boltzmann distribution