Download notes for normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Central limit theorem wikipedia , lookup

Probability amplitude wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
The Normal Distribution
In many natural processes, random variation conforms to a particular pattern. For example if a pencil was taken out
of each of 100 students pencil cases and measured we could graph the results on a histogram as follows. We would
expect most values to be around the mean length (15 cm in this case).
Normal Distribution
Probability Density
0.15
0.1
0.05
0
3
5.4 7.8 10.212.6 15 17.419.822.224.6 27
It is the shape of this curve that gives rise to the Normal Distribution with its bell shape. The curve on the right is what we
would get if a very large number of pencils were randomly selected.
Properties of the Normal Distribution
1. The graph is ……………………………… and bell shaped. The normal distribution only ever displays ……………………………….
data. This means it displays data that has been measured. The data can be grouped together in a unimodal
symmetrical histogram as above but not a bar graph. The proper graph is the bell shaped curve.
2. The normal distribution has only 2 parameters, the mean (µ) and the standard deviation (σ) which measures the
spread of that data. The shape of the curve is determined solely by the mean and the standard deviation. The larger
the spread (σ) the flatter the curve. Strictly speaking there are no upper and lower limits that the data can take. eg
The adjacent graph shows our pencil data with mean = 15cm, sd = 3cm
3. Notice that frequency is not measured on the y axis. It simply measures the
height so that the total area underneath the curve is 1 (like a triangle). The
normal distribution can be used to find …………………………. These can only be
found over an interval, not at a point. eg for the pencil data we can find the
0
5
10
15
20
25
30
probability a pencil lies between 15 & 27cm, as in the diagram.
4. For any normally distributed set of data the area under the curve follows
the same rules. The diagram below illustrates the 3 basic scenarios for the
pencil example.
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
ie We see that the area under the curve
between these points gives the following
probabilities.
µ ± 1σ = 68%
µ ± 2σ = 95%
µ ± 3σ= 99%
15
Length of pencil
Calculating Probabilities for Normally Distributed Data
There are 2 ways of doing this. The traditional method is to use tables that show all the probabilities for a special Normal
Distribution, called the ……………………………… Normal Distribution. This distribution has a mean of 0 and a standard
deviation of 1 and is denoted by the symbol Z. We also use the equation
z = X–

The second method involves using our graphics calculator. This is easier and is sufficient to use in the calculation of
probabilities in this standard.
Finding Probabilities for Normally Distributed Data (Achieved)
These will be word problems that will ask you to find the probability of a certain event taking place.
Follow these steps to answer these on your GC (tables can be used if you like):
1. Read the question carefully and write down the mean (µ) and the standard deviation (σ).
2. If necessary draw a picture shading the area of the probability you are finding.
3. Use your Graphics Calculator: Menu 2, Dist, Norm, Ncd to input the lower and upper boundaries and the parameters σ & µ and
then find the probability.
4. The answer alone is sufficient to 4 dp. (If you are using the tables Z values need to be to 3dp.) Whilst answer only is sufficient in
the exam apply the rule “word question requires a word answer”.
eg (Using the GC) X represents the length of pencils that are normally distributed with mean 15cm and sd of 3cm.
Find these probabilities:
a.
P (12 < X < 17) =
c. P(X < 13) =
Lower =
Lower =
Upper =
Upper =
σ=
µ=
σ=
µ=
b.
P (X > 17) =
Lower =
Upper =
σ=
µ=
d.
P(X > 24.2) =
Lower =
Upper =
σ=
µ=
Basic Questions
It is important that you read each question carefully and do exactly what they say. It may be they ask you to find an expected
value or express your answer as a percentage.
eg If X represents the length of pencils that are normally distributed with µ= 15cm and σ= 3cm.
Answer these questions
Find the probability that a randomly selected pencil is between 12cm and 16cm. Express your answer as a percentage.
If 200 pencils are randomly chosen how many would you expect to be greater than 20cm long?
A few questions…
1. A company finds that the distribution of cell phone call times for its reps is approximately normal with a mean of 8
minutes and a standard deviation of 2 minutes. What is the probability that any randomly selected call by one of
these reps lasts:
a.
b.
c.
d.
2.
between 5 and 8 minutes
more than 9 minutes.
What percentage of calls made would be greater than 12 minutes.
If there are 28 reps how many of them would you expect their next cell phone call to be more than 10
minutes.
The heights of girls in Year 12 is normally distributed with a mean on 164 cm and a standard deviation of 5 cm. If a
girl is chosen at random:
a.
b.
c.
d.
What is the probability their height is between 170 cm and 180 cm.
What is the probability their height is less than 150cm.
What percentage of girls are over 175cm tall.
If there are 187 girls in the form group, how many would you expect to be between 160cm and 170cm?
∑p358 Ex17.01/1-10,13,16
W p.196 ExH (some linear combinations here) NCEA 2013/1a, 2012/1ai, 2011/1a
Modelling the Normal Distribution
1. Draw the graph of a normal distribution that has a mean of 160cm and a standard deviation of 6cm
2. Over time a normal distribution X stayed in the same position but the peak dropped and the graph grew wider. Which
parameter changed and how?
3. Another normally distributed random variable Y moved left by 2 units but is still exactly the same shape. How have the
parameters changed.
4.
a.
b.
c.
d.
e.
The random variable X describes the height of all the girls in Y13 this year. Assume they are normally distributed.
What sort of graph would you draw to display this data and what are the features of the distribution.
Roughly what would the range of values for X be ie ____ ≤ x ≤ _____
What would the mean be x =
Roughly, what would the standard deviation be  =
Draw a picture below which would depict this.
5.
A researcher discovered that Form 7 girls of 20 years ago were on average 1cm shorter than the ones we looked at
in Q4 but that the range of heights was still the same. What would the mean and standard deviation have been for
the Form 7 girls of 20 years ago?
Height for Form 7 girls 20 years ago
x =
 =
6.
i.
Which of the normally distributed curves has the
smallest mean and what is it?
ii.
Which curve has the smallest standard deviation and
how is this shown?
b.
a.
c.
iii. Around about what value would the standard deviation
() for curve a. be?
7.
Why could you not describe a normallly distributed random variable using a bar graph?
8.
Complete NCEA 2013 Q1b (E)
A Possible NCEA Question
A number of new born lambs were weighed and the following graph was produced.
1. Based on these results, state 3 features that link the new born lamb weights to the normal distribution?
2. By fitting a curve to the histogram evaluate the 2 parameters of the normal distribution.
3. New born lambs with a weight less than 1.15kg have a low chance of survival. Using the variables you found in question
2, answer the following questions.
a. What is the probability that a lamb is born less than 1.15kg?
b. If a farmer has 600 new lambs born in a season, how many would you expect to be under 1.15kg?
c. What percentage of lambs will be born greater than 1.72kg
d. What is the expected number of lambs out of 600 that will be between 1.3kg and 1.7kg?
A Possible NCEA Question
A number of new born lambs were weighed and the following graph was produced.
1. Based on these results, state 3 features that link the new born lamb weights to the normal distribution?
2. By fitting a curve to the histogram evaluate the 2 parameters of the normal distribution.
3. New born lambs with a weight less than 1.15kg have a low chance of survival. Using the variables you found in question
2, answer the following questions.
a. What is the probability that a lamb is born less than 1.15kg?
b. If a farmer has 600 new lambs born in a season, how many would you expect to be under 1.15kg?
c. What percentage of lambs will be born greater than 1.72kg
d. What is the expected number of lambs out of 600 that will be between 1.3kg and 1.7kg?
Continuity Correction
The normal distribution is a continuous distribution. A continuity correction factor is used when you use a continuous function to
approximate a discrete one. This happens is 2 cases:
1.
When measured values have been rounded it is appropriate to use a continuity correction. This is because it is possible for a range
of measurements to be rounded to the same measure (ie x = 3 could be any of the values 2.5 ≤ x <3.5
2. When the discrete numbers involved are very large and a histogram of the grouped discrete data lead us to use a normal
distribution which is continuous. Eg the number of words spoken by a person in a day
A continuity correction involves considering all possible values that would end up being rounded to the measured value. To find these
values we look at the midpoint of consecutive values.
For example
Measured Value
P(x = 6)
P(x > 2)
P(x ≥ 2)
P(x ≤ 10)
P(x < 10)
P(9 ≤ x < 15)
Equivalent Continuous Probability
P(5.5 < x < …….)
P( x > …….)
P( x > …….)
P( x < …….)
P( x < …….)
P( ……. < x < ……. )
The height of Year 13 students in CHCH are measured to the nearest cm, and are found to be normally distributed with a mean of 168cm
and a standard deviation of 6cm. Find the probability that a student measured at random would have a height greater than 175cm.
Because this data was rounded when it was measured we need
to use a continuity correction when calculating the probability.
0.07
0.06
0.05
0.04
0.03
P(X>175) = P( X>…………..) (cc)
0.02
0.01
0
= ………………….
145
155
165
175
185
∑p362 Ex17.02/1=>, W p.200 Ex I, 2012/2b
x
Continuity Correction
The normal distribution is a continuous distribution. A continuity correction factor is used when you use a continuous function to
approximate a discrete one. This happens is 2 cases:
3.
When measured values have been rounded it is appropriate to use a continuity correction. This is because it is possible for a range
of measurements to be rounded to the same measure (ie x = 3 could be any of the values 2.5 ≤ x <3.5
4. When the discrete numbers involved are very large and a histogram of the grouped discrete data lead us to use a normal
distribution which is continuous. Eg the number of words spoken by a person in a day
A continuity correction involves considering all possible values that would end up being rounded to the measured value. To find these
values we look at the midpoint of consecutive values.
For example
Measured Value
P(x = 6)
P(x > 2)
P(x ≥ 2)
P(x ≤ 10)
P(x < 10)
P(9 ≤ x < 15)
Equivalent Continuous Probability
P(5.5 < x < …….)
P( x > …….)
P( x > …….)
P( x < …….)
P( x < …….)
P( ……. < x < ……. )
The height of Year 13 students in CHCH are measured to the nearest cm, and are found to be normally distributed with a mean of 168cm
and a standard deviation of 6cm. Find the probability that a student measured at random would have a height greater than 175cm.
Because this data was rounded when it was measured we need
to use a continuity correction when calculating the probability.
0.07
0.06
0.05
0.04
0.03
P(X>175) = P( X>…………..) (cc)
0.02
0.01
0
= ………………….
∑p362 Ex17.02/1=>, W p.200 Ex I, 2012/2b
145
155
165
175
185
x
Inverse Normal Problems (Part of Merit))
These problems all involve a situation where you are given the probability and two of (either µ, σ or an X value)
and then asked to find the one you are not given. It is important to use the correct notation and draw a diagram
of the situation at Merit. There are 2 cases here; one easy & one harder. The methods described below will
focus on using the GC (tables can be used & are supplied in the exam) in the first case and the formula for
transforming any Normal distribution into the Standard Normal Distribution (ie z = x –  ) in the second case.

Case 1 - Given the probability, µ and the σ, find an X value. (Easier)
Eg In 1999, 4% of Y13 Bursary Statistics candidates gained a Scholarship. If the distribution of grades is normal
& with the mean exam mark of 56 and  of 14.
Where X represent Bursary marks , x is the value we are
a. Find the cut off mark to gain a Scholarship?.
looking for. We have to find x such that P(X>x) = 0.04
This is quite easy on the GC. 1st get into the correct area.
Menu, 2, Dist (F5), Norm F1, InvN.
Now input the info: Tail = Right, Area=0.04, σ=14, µ=56
0.03
0.025
0.02
0.015
0.01
0.005
sas
This gives us an X value of 80.509 Which would
mean that the min mark would be 81% in the exam.
Not 80 as this % would give more than 4%.
0
0
20
40
60
56
80
100
x
120
Find this
X
b. Find the boundary gradess for the middle 60% of Bursary marks?
Handout, ∑p362 Ex17.03/6=>, W p.203 Ex J/1-3,5,8
NCEA (2012/1b, 2011/1b,)
Case 2 - Finding µ or σ given the Probability and the X value (Harder)
Eg The weight of ball bearings produced is normally distributed with a mean of 15.7g. It is found that 8% of
these are rejected because they are too small. Ball bearings weighing less than 12g are rejected. What is the
standard deviation of the ball bearing weights.
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
12 15.7
z=x–

25
X
30
Remember Z is the Standard Normal Distribution with µ = 0 and
σ = 1. We transform any normal distribution to a standard
normal distribution (Z) using the equation z = x – 

(given in tests)
In our example we have P(X<12) = 0.08, µ =15.7, X = 12. If we find
the Z value corresponding to X=12 with the probability = 0.08,
then we can sub these values into the equation above to find σ.
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
___________________________
-4
-2
0
2
Find this
z 0
Z
0
4
ie P(X<12) = 0.08. From our GC using InvN, Tail = left, Area =0.08,
µ=0, σ=1 and P(Z<z) = 0.08 => z=1.405
So the z value corresponding to x=12 is z=-1.405
From here we simply substitute and rearrange the formula.
ie z = x –  

– 1.405 = 12 – 15.7

so  = 12 – 15.7 = 2.633
-1.405
The standard deviation of the ball bearing weights is 2.633g
Handout, ∑p369 Ex17.04/1=>, W p.203 Ex J/4, 7 , 2013/1c, 2009/1bi (Remember always draw a diagram 1st)
Harder Normal Distribution Problems (Merit)
This involves a combination of skills involving the normal distribution and:
1. combining several events (and probabilities) like in a probability tree and then multiplying them to
evaluate the final overall probability.
2. using the formula for conditional probability to answer questions of this type
3. working backwards to find a common value for 2 distinct Normally distributed random variables.
4. Finding the expected value of a new rv which is derived from an initial normal situation.
See the examples on ∑p370 – 371 and Q2 on pg 372 (go over on the board)
Complete ∑p372 Ex17.05/1,3,6,11 easier combined problems ∑p372 Ex17.05/2,4,5,7=>
Harder Normal Distribution Problems (Merit)
This involves a combination of skills involving the normal distribution and:
1. combining several events (and probabilities) like in a probability tree and then multiplying them to
evaluate the final overall probability.
2. using the formula for conditional probability to answer questions of this type
3. working backwards to find a common value for 2 distinct Normally distributed random variables.
4. Finding the expected value of a new rv which is derived from an initial normal situation.
See the examples on ∑p370 – 371 and Q2 on pg 372 (go over on the board)
Complete ∑p372 Ex17.05/1,3,6,11 easier combined problems ∑p372 Ex17.05/2,4,5,7=>
Harder Normal Distribution Problems (Merit)
This involves a combination of skills involving the normal distribution and:
1. combining several events (and probabilities) like in a probability tree and then multiplying them to
evaluate the final overall probability.
2. using the formula for conditional probability to answer questions of this type
3. working backwards to find a common value for 2 distinct Normally distributed random variables.
4. Finding the expected value of a new rv which is derived from an initial normal situation.
See the examples on ∑p370 – 371 and Q2 on pg 372 (go over on the board)
Complete ∑p372 Ex17.05/1,3,6,11 easier combined problems ∑p372 Ex17.05/2,4,5,7=>
Harder Normal Distribution Problems (Merit)
This involves a combination of skills involving the normal distribution and:
1. combining several events (and probabilities) like in a probability tree and then multiplying them to
evaluate the final overall probability.
2. using the formula for conditional probability to answer questions of this type
3. working backwards to find a common value for 2 distinct Normally distributed random variables.
4. Finding the expected value of a new rv which is derived from an initial normal situation.
See the examples on ∑p370 – 371 and Q2 on pg 372 (go over on the board)
Complete ∑p372 Ex17.05/1,3,6,11 easier combined problems ∑p372 Ex17.05/2,4,5,7=>
Combinations of Random Variables
When we combine two normally distributed random variables (rv’s) we start with 2 separate rv’s that each
have their own mean and standard deviation. In order to continue with the problem firstly the rv’s must be
independent (ie the occurrence of one does not affect the other) and secondly we must combine the 2
parameters to find a mean and a std deviation for the new rv. Once this is done we can then solve any
probability problems for the new rv with its new parameters.
We use the following rules to combine the parameters:
E[aX + bY] = aE[X] + bE[Y]
Var[aX + bY] = a2Var[X] + b2Var[Y]
Remember that the mean is E(X) or x and that x = sd(X) =
(given in the exam)
Var(X)
Method to solve these problems:
1. Define each random variable and write an equation for the new random variable.
2. Find the parameters for the new random variable (to find sd always work thru the variance formula).
3. Answer the probability question you have been asked.
Eg # 1 A tin of baked beans consists of an empty can which have a mean weight of 37g and a standard
deviation of 2 g. A machine then dispenses the contents with an average weight of 450g and a standard
deviation of 12g. Calculate the probability that a full can will weight more than 500g.
Let the weight of a full can be F, the weight of an empty can be E and the weight of the contents C
So the equation that links them is F = ………………..
Before finding the probability we need to calculate µ and σ for a …………………………
where
E = 37g E = 2g
C = 450g C = 12g
E(F) =
Var (F) =
always work thru the variance equation.
SD(F) =
Using our calculator with µ= …………… and σ = ……………. We get P(X>500) =………………………
Difference Problems ie where the new variable = X - Y
A similar result also applies for the differences of 2 rv’s ie
Except finding the variance is slightly different….
D=A–B
Note that Var(X - Y) = Var(X) + (-1)2Var (Y)
= Var(X) + Var (Y)
= Var(X + Y)
So when we subtract 2 random variables we still have to add the variances to get the new variance.
∑p377 Ex18.01/1,3,2, 4=> AinS p130, (2008/4, 2007/3d, 2006/4)
Linear Combinations of Independent Random Variables (Excellence)
The next step up from sums and difference problems done above, is where we have a linear combination of a
number of independent rv’s or a linear function of some rv’s. A subtle but important difference in these is
highlighted in the two examples discussed below. The formulae given in NCEA exams are still applicable here.
ie.
E[aX + bY] = aE[X] + bE[Y]
Var[aX + bY] = a2Var[X] + b2Var[Y]
Eg 1. A graphics calculator with no batteries has a mean weight 145g with σ = 1.7g. Each of the 4
batteries has a mean weight of 14g with σ = 0.3g. What is the probability that a calculator with its
batteries weighed more than 200g.
Let T = the total weight of a calculator, E = the empty weight, B = the weight of a battery.
The equation linking these is T =
(be careful here)
not
Now working out the mean and the standard deviation for T
E(T) = ………………………………………………… = …………………………….
Var(T) = Var (E + B + B + B + B) =……………………………………………….. (work thru the variance formula)
= ………………………………………………..
= ………………………………………………..
So the standard deviation σ(T) = …………………………
Now answering the question P(T > 200) =…………………………….. (from the calculator)
Eg 2 See the question in Sigma p379 or the example in your Pink book p167 for a good example of a linear
function of random variables. (This is beyond the scope of the new course)
Note: be aware of the difference between a number of independent rv’s eg X + X and a linear function of a
single rv eg 2X. ie X + X ≠ 2X when it comes to calculating the variance.
Try these questions.
W p.196 Ex H/Q3=>
∑p380 Ex18.02/1,2,3,5 2007/3e (linear function of rv’s)
∑p380 Ex18.02/4,7,8,10 2012/2d, 2009/1bii, 2008/8, (sum of a number of independent rv’s)
Note:
What if you are asked to find the mean and sd of the r.v. W = 3X+7
using E(aX+b) =aE(X) +b
so E(W) = E(3X+7) = 3E(X) + 7
2
and Var(aX+b) = a Var(X)
and Var(W) = Var(3X + 7) = 32Var(X)
These formulae are given in the exam
note b is gone!