Download Mean and Median

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Math 2015
Lesson 21
Mean and Median
We discuss the mean and the median, two important statistics about a distribution.
The Median
The median is the “halfway” point of a distribution. It is the point where half the population
has a value less than the median, and half has greater. So if p(x) is a density function and M is
the median, we must have
∫
Example:
M
−∞
p(x)dx = 0.5
Suppose the life span of an insect has density function p(x) = 721 x for x
between 0 and 12 months, and 0 elsewhere. What is the median lifespan of the
insects?
We want
So we find that we must have M = __________________.
This means that half the insects live longer than ______ months, and half live
less.
If we happen to have the cumulative distribution function, finding the median is easier. Since
M
we want to know the value M such that ∫−∞ p(x)dx = 0.5 , where p is the density function, all
we need to do is calculate the value M for which P(M) = 0.5, where P is the cumulative
distribution function.
Example:
Suppose we have a distribution with cumulative distribution function P(t) = t 2
for 0 ≤ t ≤ 1. (Of course P(t) = 0 for t < 0 and P(t) = 1 for t > 1.) What is the
median of this distribution?
We solve P(M) = 0.5, and find ___________________________.
107
Math 2015
Lesson 21
The Mean
The mean is the average. In general, this would imply that we should “add up” all the values
and divide by the total number.
Let’s suppose p(x) is a density function for the ages of an insect population. Call the total
population N. We will work in slices of width ∆x, as though we are setting up a left-hand sum.
Each rectangle in the sum approximates the total number of insects of a particular age. We
show an example below with ∆x =1:
1
2
3
4
5
6
How many of the insects are in the slice at point x with width ∆x? The proportion should be
roughly p(x)∆x, so the total number with that value is
p(x) Δx N
We want to count the age x a total of p(x)∆x N times in the sum, so we need to add up
x p(x)∆x N for the rectangle with value x.
Once we have added up all the strips, we need to divide by the total number, N:
∑ x p(x) Δx N
= ∑ x p(x)Δx
N
Now of course as n increases, we get a more accurate answer and the sum becomes an
integral. Since we want to add up over all possible values, we get the mean to be
∫
∞
−∞
x p(x) dx
(Remember: From our derivation we have the idea above that x is the score, and p(x) dx tells
us “how many times,” or more specifically, what proportion of times, that score occurred.)
Example:
Suppose we have the same insect population as in the first example: The life span
has density function p(x) = 721 x for x between 0 and 12 months, and 0 elsewhere.
What is the mean lifespan of the insects?
We must compute the integral
∫
∞
−∞
x p(x) dx =
108
Math 2015
Lesson 21
Thus the mean in this case is ______. Thus, the mean is slightly below the median in this
case. (In other words, more than half the insects live longer than the “average” lifespan, and
less than half live less.)
The mean has an interesting geometric interpretation: If we imagine the geometric figure in
the shape of the density function, the mean is the “balance point” for the shape along the x
axis:
0.15
0.125
0.1
0.075
0.05
0.025
2
4
6
8
10
12
The mean is the point where the graph of
the distribution would “balance”
Example:
Find the mean and median of the distribution with density function p(x) = 19 x 2
for 0 < x < 3, and zero elsewhere.
1
0.8
0.6
0.4
0.2
-1
1
2
3
4
The distribution p(x) = 19 x 2
109
Math 2015
Lesson 21
The Normal Distribution
A special distribution referred to as the normal distribution is given by the following formula:
p(x) =
1
−( x− µ ) 2 /( 2σ 2 )
e
σ 2π
The constants µ and σ specify features of the distribution. An example graph with µ = 0 and
σ = 1 is shown below:
0.4
0.3
0.2
0.1
-4
-2
2
4
The normal distribution is symmetric about the point where x = µ, so the graph will balance at
µ. Hence µ is the mean of the normal distribution. We also note that due to this symmetry, the
curve above must have
∫
0
−∞
1 − x 2 /2
e
dx =
2π
So in fact, µ is also the median of the distribution. The constant σ is called the standard
deviation, and it tells us how flat and spread out the distribution is. The greater the standard
deviation is, the more flat and spread out the normal distribution becomes.
−x 2
The function e has no elementary antiderivative, but the area under the curve may still be
computed using numerical techniques.
The normal distribution function (often called a bell curve) is often used to represent the way
randomly occurring values are distributed.
2
1
e−( x−10 ) / 8
2 2π
were a distribution function for the number of people in millions in the US who are on the
internet at a given time. According to this density function,
Example:
Suppose that
p(x) =
• What is the median number of people on the internet at a given time?
• What is the mean number of people on the internet at a given time?
110
Math 2015
•
Lesson 21
What is the standard deviation of this distribution?
• Write down an integral to find the fraction of the time there are between 8 and 12 million
people on the internet.
In the last example, we note a difficulty in finding the proportion of time there are between 8
and 12 million users: the integral is too complicated to be evaluated by hand. We can of
course still make estimates using an appropriate numerical technique, such as Simpson’s rule.
We could use our Simpson’s calculator from the second Lab, and get the following:
n
∆x
2
4
8
16
32
2
1
0.5
0.25
0.125
Result
0.693237
0.683058
0.682711
0.682691
0.68269
The results seem to be converging to approximately .68. Thus, we estimate approximately a
probability of about 0.68 of between 8 and 12 million people be on the internet at one time.
(Alternatively: There are between 8 and 12 million people on the internet about 68% of the
time.)
Summary
Today, we have
•
Defined for a density function p(x) the mean (
such that
∫
M
−∞
∫
∞
−∞
x p(x) dx ), and median (the point M
p(x)dx = 0.5 ).
•
Calculated means and medians from density functions, and medians from cumulative
distribution functions.
•
Discussed the normal distribution. We learned how to find its mean, median, and
standard deviation. We used numerical methods to approximate the integral for a
normal distribution.
111