Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Distributions W&W Chapter 4 Continuous Distributions Many variables we wish to study in Political Science are continuous, rather than discrete. Military expenditures Budget data We need a continuous probability distribution, rather than a discrete one (such as the binomial). Frequency Bar Graphs Recall the frequency bar graph for a continuous measure. We create a certain number of continuous “cells”, and graph their height at the cell midpoint. The height of each bar represents the frequency (f) of cases in each cell. Relative Frequency Density We can also graph relative frequency (f/N) using a bar graph (Figure 4-3). It is convenient to change the vertical scale to relative frequency density, which makes the total area (the sum of all the areas of the bars) equal to 1. Relative Frequency Density Relative Frequency Density = relative frequency = f/N cell width cell width What happens as N increases? Because the area remains fixed at 1, the relative frequency density becomes approximately a curve. We will call this the density or the probability distribution. Some Calculus For continuous random variables, sums are replaced by integrals, the limiting sum in calculus. Pr(a X b) = p(x)dx Think of the integral taking p(x) times dx, where dx is a small number so that p(x)dx is the area of a very thin rectangle. More Calculus Similarly, the mean and variance are calculated as integrals for continuous distributions. = xp(x) dx 2 = (x-)2p(x) dx Unlike in the discrete case, a continuous probability distribution is not the probability of a specific point x. The probability of a specific point on a continuous distribution is _____? More calculus Rather we are calculating the probability of the interval between two points (such as a and b). Important: the area under the entire probability density function equals one. A cumulative distribution function (c.d.f) is defined by Pr(X x). The Normal Distribution For many random variables, the probability distribution is a bell-shaped curve called the normal curve or Gaussian curve (in honor of the German scientist Karl Friedrick Gauss, 1777-1855). Also we will discover later that most distributions have approximately normal sampling distributions as N gets large, making it very useful in statistics. The General Normal Distribution p(x) = [ 1 ] e[-1 (x - )2] 2 2 2 where = mean 2 = variance e = 2.71828 = 3.1416 Properties of the Normal Distribution As x gets far away from the mean, the negative exponent decreases p(x) and the probability approaches zero symmetrically in both tails. The mean, , is the center point. The mean, , and the variance, 2 are the parameters of the normal curve, meaning they are all you need to know to characterize the distribution. The Standard Normal Distribution p(z) = [ 1 ] e[-1/2(z)2] 2 The mean and variance of a distribution vary significantly across different samples, depending on how our variables are measured. So we need a way to standardize the distribution so that we can make comparisons across samples. We do this by computing a Z-score. Z-scores Z = (x - ) Example: IQ's in the US Suppose the mean IQ is 100 and the standard deviation is 16. The z-score for a person who has an IQ of 125 or an IQ of 85 is: Z = (125 - 100)/16 = 1.56 Z = (85 - 100)/16 = -.9375 Z-scores continued Scores above the mean have positive z scores, while scores below the mean have negative z scores (the mean itself has a z score of zero). Each z-value is the number of standard deviations from the mean because for the standard normal distribution, =0 and =1. Z-scores continued The area under the curve gives us the probability above or below a certain point (recall that the probability of a particular value of x is zero!). What proportion of people have higher IQ's than a person with an IQ of 125? We know that Z = 1.56 We go to the table in the back of the book (page 672) and we can see that the area beyond Z = 1.56 is .059. In other words, 5.9% of people have higher IQ's. Z-scores continued To find a probability less than a particular Z, we use the symmetrical properties of the curve. The total area under the curve is 1 and half is above the mean and half is below. For an IQ of 85, what % of people have lower IQ's? Z = -.9375 Since the curve is symmetric, the area below -.9375 must equal the area above .9375. Area for .94 equals .174, or 17.4% have lower IQ's. Interesting property of the Standard Normal Distribution 68% of all points fall within +/- 1 standard deviation of the mean, 95% within +/- 2 standard deviations, and 99.7% within +/- 3 standard deviations. Where do these values come from? Expected Value Recall that x = mean of X = average of X = expected value of X, or E(X) = xp(x) We can apply the same formula to a function of a random variable. Expected Value continued Suppose the annual cost of clothing (R) is a function of the number of girls (X) in the family, that is: R = g(X) or more specifically, R = -100X2 + 300X + 500 Expected Value continued Number of girls Probability Clothing Cost x p(x) r = g(x) 0 .14 $500, R=-100(0)2 + 300(0) + 500 1 .39 $700, R=-100(1)2 + 300(1) + 500 2 .36 $700, R=-100(2)2 + 300(2) + 500 3 .11 $500, R=-100(3)2 + 300(3) + 500 ______________________________________ Expected Value continued Assume that the price goes down for the third child because they can use hand me downs from their older sisters. To calculate the mean for R, R, we can use the following formula: R = g(x)p(x) Expected Value continued We multiply p(x) times g(x) to get: g(x)p(x) (.14)($500) = $70 (.39)($700) = $273 (.36)($700) = $252 (.11)($500) = $55 g(x)p(x) = $650 Expected Value continued Thus the average cost in a family of three will be $650. See Table 4-5 for the longer calculation. What we have done with the above formula, is expected value (also called an expectation). E[g(X)] = g(x)p(x) One possible form of the function, g(X) is g(X) = (X - )2. This becomes: Expected Value continued E(X - )2 = (x - )2 p(x) What is this? The population variance! The variance is a kind of expected value, the expected squared deviation from the mean. Another Example: betting Let's suppose that the Boston Red Sox and the Detroit Tigers are fighting for 1st place in the American League East. The odds that Boston will defeat Detroit is 3 to 1. You are offered the following wager: $120 if Detroit wins and $40 if Boston wins. What is the expected value of this wager? Personally how much would you be willing to pay to play this gamble? Betting example Odds of Boston winning, d = 1/3. p = d/(d+1) or d = p/(1-p) p = 1/3/(1/3 + 1) = 1/4 x p(x) v Boston wins .25 $40 Boston loses .75 $120 Betting Example E(x) = p(x)v = (.25)(40) + (.75)(120) E(x) = 10 + 90 = $100. If you were risk neutral, you would be willing to pay $100, more if you were risk acceptant and less if you were risk averse.