Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 2015 Lesson 21 Mean and Median We discuss the mean and the median, two important statistics about a distribution. The Median The median is the “halfway” point of a distribution. It is the point where half the population has a value less than the median, and half has greater. So if p(x) is a density function and M is the median, we must have ∫ Example: M −∞ p(x)dx = 0.5 Suppose the life span of an insect has density function p(x) = 721 x for x between 0 and 12 months, and 0 elsewhere. What is the median lifespan of the insects? We want So we find that we must have M = __________________. This means that half the insects live longer than ______ months, and half live less. If we happen to have the cumulative distribution function, finding the median is easier. Since M we want to know the value M such that ∫−∞ p(x)dx = 0.5 , where p is the density function, all we need to do is calculate the value M for which P(M) = 0.5, where P is the cumulative distribution function. Example: Suppose we have a distribution with cumulative distribution function P(t) = t 2 for 0 ≤ t ≤ 1. (Of course P(t) = 0 for t < 0 and P(t) = 1 for t > 1.) What is the median of this distribution? We solve P(M) = 0.5, and find ___________________________. 107 Math 2015 Lesson 21 The Mean The mean is the average. In general, this would imply that we should “add up” all the values and divide by the total number. Let’s suppose p(x) is a density function for the ages of an insect population. Call the total population N. We will work in slices of width ∆x, as though we are setting up a left-hand sum. Each rectangle in the sum approximates the total number of insects of a particular age. We show an example below with ∆x =1: 1 2 3 4 5 6 How many of the insects are in the slice at point x with width ∆x? The proportion should be roughly p(x)∆x, so the total number with that value is p(x) Δx N We want to count the age x a total of p(x)∆x N times in the sum, so we need to add up x p(x)∆x N for the rectangle with value x. Once we have added up all the strips, we need to divide by the total number, N: ∑ x p(x) Δx N = ∑ x p(x)Δx N Now of course as n increases, we get a more accurate answer and the sum becomes an integral. Since we want to add up over all possible values, we get the mean to be ∫ ∞ −∞ x p(x) dx (Remember: From our derivation we have the idea above that x is the score, and p(x) dx tells us “how many times,” or more specifically, what proportion of times, that score occurred.) Example: Suppose we have the same insect population as in the first example: The life span has density function p(x) = 721 x for x between 0 and 12 months, and 0 elsewhere. What is the mean lifespan of the insects? We must compute the integral ∫ ∞ −∞ x p(x) dx = 108 Math 2015 Lesson 21 Thus the mean in this case is ______. Thus, the mean is slightly below the median in this case. (In other words, more than half the insects live longer than the “average” lifespan, and less than half live less.) The mean has an interesting geometric interpretation: If we imagine the geometric figure in the shape of the density function, the mean is the “balance point” for the shape along the x axis: 0.15 0.125 0.1 0.075 0.05 0.025 2 4 6 8 10 12 The mean is the point where the graph of the distribution would “balance” Example: Find the mean and median of the distribution with density function p(x) = 19 x 2 for 0 < x < 3, and zero elsewhere. 1 0.8 0.6 0.4 0.2 -1 1 2 3 4 The distribution p(x) = 19 x 2 109 Math 2015 Lesson 21 The Normal Distribution A special distribution referred to as the normal distribution is given by the following formula: p(x) = 1 −( x− µ ) 2 /( 2σ 2 ) e σ 2π The constants µ and σ specify features of the distribution. An example graph with µ = 0 and σ = 1 is shown below: 0.4 0.3 0.2 0.1 -4 -2 2 4 The normal distribution is symmetric about the point where x = µ, so the graph will balance at µ. Hence µ is the mean of the normal distribution. We also note that due to this symmetry, the curve above must have ∫ 0 −∞ 1 − x 2 /2 e dx = 2π So in fact, µ is also the median of the distribution. The constant σ is called the standard deviation, and it tells us how flat and spread out the distribution is. The greater the standard deviation is, the more flat and spread out the normal distribution becomes. −x 2 The function e has no elementary antiderivative, but the area under the curve may still be computed using numerical techniques. The normal distribution function (often called a bell curve) is often used to represent the way randomly occurring values are distributed. 2 1 e−( x−10 ) / 8 2 2π were a distribution function for the number of people in millions in the US who are on the internet at a given time. According to this density function, Example: Suppose that p(x) = • What is the median number of people on the internet at a given time? • What is the mean number of people on the internet at a given time? 110 Math 2015 • Lesson 21 What is the standard deviation of this distribution? • Write down an integral to find the fraction of the time there are between 8 and 12 million people on the internet. In the last example, we note a difficulty in finding the proportion of time there are between 8 and 12 million users: the integral is too complicated to be evaluated by hand. We can of course still make estimates using an appropriate numerical technique, such as Simpson’s rule. We could use our Simpson’s calculator from the second Lab, and get the following: n ∆x 2 4 8 16 32 2 1 0.5 0.25 0.125 Result 0.693237 0.683058 0.682711 0.682691 0.68269 The results seem to be converging to approximately .68. Thus, we estimate approximately a probability of about 0.68 of between 8 and 12 million people be on the internet at one time. (Alternatively: There are between 8 and 12 million people on the internet about 68% of the time.) Summary Today, we have • Defined for a density function p(x) the mean ( such that ∫ M −∞ ∫ ∞ −∞ x p(x) dx ), and median (the point M p(x)dx = 0.5 ). • Calculated means and medians from density functions, and medians from cumulative distribution functions. • Discussed the normal distribution. We learned how to find its mean, median, and standard deviation. We used numerical methods to approximate the integral for a normal distribution. 111