Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FREQUENTLY ASKED QUESTIONS October 5, 2010 Content Questions What is the relation between binomial, Gaussian, Poisson, and Landau distribution? How can we “jump” from one distribution to another? The Gaussian and Poisson distributions are both limits of the binomial distribution, and the Gaussian is also a limit of the Poisson distribution. Specifically: • The Poisson is the limit of the binomial for p small and n >> µ: see FAQ 2 for how this comes about mathematically. • The Gaussian is the limit of the Poisson for large µ. For a mathematical argument (using Stirling’s approximation for the log of a factorial), see Barlow, p. 40. • The Gaussian is the limit of the binomial for large µ = np. I think “jump” is not quite the right word... if appropriate conditions are satisfied, the distributions approach each other, and how close is “close enough” to replace one distribution by another depends on how good your answer needs to be. For example, the Poisson distribution looks a lot like √ a Gaussian with the same mean and a sigma equal to µ, for large n. It’s not too bad a match for n = 5 or so (might be good enough for what you need, or not, to approximate the Poisson as a Gaussian), and the Poisson and the Gaussian look almost indistinguishable for n = 30. If you play with the Mathematica notebook from last lecture you can get a feeling for this. The Landau distribution (not sure if I read it correctly?) is something else – I did not mention this distribution in class. This is a specific distribution that in fact doesn’t look Gaussian (it has a long tail). It describes energy loss in thin layers of matter, and it can be derived from physical principles for this description. What if our Poisson distribution isn’t counting– what if µ has units? The argument of the exponential, −µ, in the Poisson distribution, must be dimensionless– this distribution always describes some kind of counting. In the distribution P(x; µ), µ is the mean value of the distribution of x, and x is always “the number of something”. Note that while the x values corresponding to the measurement are discrete, the mean µ need not be an integer. Another note: in the limit of large µ, the x values get finely spaced, and can eventually be treated as a continuous as the distribution becomes Gaussian– and for the Gaussian, the values of x can correspond to some dimension-bearing quantity, like length or energy. Can you explain the meaning of likelihood as probability of measuring x1 , x2 , ..., xN at the same time? Suppose you have a data set of N measurements x1 , x2 , ..., xN : this is a result of measuring the specific values x1 and x2 and x3 , ..., and xN . This is what I mean by “measuring them at the same time”: your dataset is the assemblage of the specific measurements. Now you imagine your (unknown for the moment) mean value is µ0 : supposing this µ0 is the right parameter describing the distribution, then given that parameter, the probability of measuring some particular xi is P (xi ; µ0 ). The probability of measuring all of them, i.e. the probability of measuring the actual data set you got, is Πi P (xi ; µ0 ) (assuming the measurements are independent). This probability product is the “likelihood”: it’s the probability of getting your particular data set, given the parameter. (Then you maximize as a function of the parameter to get the best parameter– more on that later.) To make a simple example: take a coin with heads and tails, that might or might not be loaded, so that the probability of getting heads is p. Say you toss the coin N = 3 times. x = 1 is heads, and x = 0 is tails. Suppose your data set is 0, 0, 1. The probability of the first measurement being tails is 1 − p; the probability of the second measurement being tails is 1 − p; the probability of the third measurement being heads is p. So the likelihood of getting the specific data set you got (given p) is (1 − p)(1 − p)p. We’ll see more examples of likelihoods later. How do Gaussian and Poisson mean and mean-error estimations differ in practice? In practice, we can usually treat a Poisson as having Gaussian properties if µ is greater than 8 or 10 or so. Not infrequently, for µ as small as 5 or so it’s also OK to treat as Gaussian (for the purpose of the using the arithmetic mean x̄ as √the estimator of the true mean, that’s typically fine). Taking √ σ = µ ∼ x̄ is also fine– the square root relation between µ and σ is exact for a Poisson. However, there are cases when it’s better to take Poissonianness into account: for example, when setting limits based on observations of very small numbers of events, or in ratios of small numbers. Why is s2 the best estimate of σ 2 ? Although I didn’t show it, this can be shown to be the “best” estimator by a maximum likelihood method (although there are some complications as to the meaning of “best”: you actually want an “unbiased” estimator, for which expectation value is the true value, as we discussed a bit last class. For this the N1−1 factor is desirable when s is estimated using x̄ rather than the true mean.) There are also cases where you may be better off estimating sigma from knowledge of your apparatus, or from a known property of the distribution– for example, if you’re dealing with a Poisson, it usually works well to assume √ σ = x̄. What’s the relation between the estimation and a fit, e.g. a Gaussian fit? Do we do the estimation first, then do the fit? Well, a fit of a data set to a function is a more general thing than just estimating a mean: for the fit, you are comparing each point to a predicted value for a given function parameterization, and minimizing with respect to the parameters to find the best estimates of the parameters (and the function could be a Gaussian or could be something else). You don’t necessarily need to estimate a mean (or other parameter) before launching into a fit algorithm. However, in practice, it’s often useful to start the fit minimization off with decent estimates of the parameters, to save computation time– so it can be a good idea to estimate mean or other parameters first, then feed these parameters to the fit. I’ll talk about fitting later. How do the off-diagonal entries in M manifest in the probability distribution? The off-diagonal entries in the error matrix M will appear as cross-terms in the joint probability P (u, v)’s exponential argument– we’ll get to this next class. Can you explain the contours of equal probability? Where did the different amplitudes really come from – just σu and σv ? I’m not sure what you mean by “amplitudes”... but the concentric ellipses I drew were intended to be lines in u and v space for which the probability is equal. If you look at the joint binormal probability expression, P (u, v) = 2 2 − 12 ( u2 + v 2 ) 1 σu σv , e 2πσu σv 2 2 u v if you choose ( σu2 + σv 2 ) = constant, the probability P (u, v) 2 2 will have a constant value. This equation, ( σu2 + σv 2 ) = constant, describes u v an ellipse in u − v space. You can draw a different ellipse for a different 2 2 chosen constant; if you choose ( σu2 + σv 2 ) = 1, that’s the contour at which the u v √ probability is down from its maximum √ by a factor of 1/ e, just as the 1D Gaussian is down by a factor of 1/ e at one σ away from the mean. The 2 2 ellipse ( σu2 + σv 2 ) = 1 is the “1σ contour”. The semi-major and semi-minor u v axis lengths are given by the σv and σu values. For the ellipse, when moving u, why would the max of v be the same? It won’t be the same maximum value of v, but the maximum value of probability will always be for the same value of v. For the vertical ellipse I drew (for non-correlated variables), along any vertical line at constant u, the maximum value of probability as a function of v is always at v = 0.