Download The Normal probability distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
LESSON FIVE: INTRODUCTION TO PROBABILITY DISTRIBUTIONS
Introduction to probability distributions and random variables
The concept of a probability distribution was introduced briefly in lesson
four here it was described as a list of every possible outcome with
corresponding probability. The corresponding probabilities were
calculated as simple relative frequencies. In lesson four you may also
have noted that probability calculations also for simple real life situations
became complex very quickly. But, under certain well defined conditions,
it is possible to derive formulae to calculate the probabilities.
Random variables
In statistics the outcome of a random experiment is variable and
determined by chance. The numeric value of the outcome is called a
random variable. For example, when a die is thrown the face value that
turns up may be any one of the numbers: 1, 2,3, 4, 5 or 6. The face value
that turns is a random variable. In other situations the outcome of a
random experiment is not numeric, so a formula or rule is used to assign
a numeric value to each outcome. For example, if three coins are tossed,
the outcomes are.
3 (when the outcome is three Heads)
2 (when the outcome is two Heads and one Tail)
1 (when the outcome is one Head and two Tails)
0 (when the outcome is three Tails)
Hence the more general definition of a random variable is the rule that
assigns a numeric value to each outcome of a random experiment.
The numeric values of a random variable may be discrete or continuous.
A discrete random variable can assume a finite number of distinct values.
A continuous random variable can assume any value within a continuous
interval (example: the time to process a online customer order).
The probability that the random variable X will assume the value x is
written as P(X=x) or simply P(x) on a trial of a random experiment.
A trial of a random experiment is a single execution of the experiment.
Probability distributions
Definitions
An empirical probability distribution is a list of every outcome of a
random experiment with the corresponding probability.
A discrete probability distribution is the probability distribution of the
discrete random variable. The Uniform, Bernoulli and Poisson
distributions are discrete distribution.
A continuous probability distribution is the probability distribution of a
continuous random variable: the outcomes can be any value from of
continuous interval.
The discrete uniform probability distribution
P(x) = 1: the sum of the probability must be unity. This is an essential
property of all probability distribution.
Graphs of a probability distribution
The outcomes (x) of the experiment are marked on the horizontal axis
and probabilities (y).
A probability histogram depicts probability as area for discrete
distributions.
The area of a probability histogram must be unity.
Probability histograms are useful as a graphical tool to aid visualization of
certain ideas as reading probabilities from tables and approximating
discrete probabilities with continuous. However, for continuous random
variables it will be necessary to calculate probabilities in terms of area
under the probability curve.
The Binomial probability distribution.
The Binomial formula may be used to calculate probabilities when the
following conditions are true:
1. There are only two mutually exclusive outcomes;
2. The probability of success is the same for each trial;
3. The trials are independent;
4. The number of trial is finite.
The Binomial probability distribution function (Binomial formula) is
𝑛!
𝑃(π‘₯ ) = (𝑛π‘₯)𝑝 π‘₯ π‘ž π‘›βˆ’π‘₯ = (π‘›βˆ’π‘₯)!π‘₯! 𝑝 π‘₯ π‘ž π‘›βˆ’π‘₯ =
𝑛
𝐢π‘₯ 𝑝 π‘₯ π‘ž π‘›βˆ’π‘₯
𝑛
𝐢π‘₯ gives the number of outcomes (or arrangements) that satisfied the
condition x successes out of n trials.
Example 1
An exam consists of four multiple choice question.
If each question has only one correct answer calculate the probability
that a student randomly selects the correct answer to (a) all four
questions; (b) any three questions; (c) any two questions; (d) any one
question; (e) none of the questions when there is a choice of four
possible answers.
When there a choice of four answer to a question p =0,25 and q = 0,75.
4!
P(x=0) =(4βˆ’0)!0! 0,250 0,754 = 1 π‘₯ 0,3164 = 0,3164
4!
4βˆ—3βˆ—2βˆ—1
P(x=1) =(4βˆ’1)!1! 0,251 0,753 =
3βˆ—2βˆ—1
0,25 βˆ— 0,753 = 4 βˆ— 0,25 βˆ— 0,4219 =
0,4219
4!
P(x=2) =(4βˆ’2)!2! 0,252 , 752 =
4βˆ—3βˆ—2βˆ—1
2βˆ—2
0,0625 βˆ— 0,5625 = 6 βˆ— 0,0625 βˆ—
0,5625 = 0,2109
P(x=3) =
4!
(4βˆ’3)!3!
0,75 = 0,0469
0,253 0,751 =
4βˆ—3βˆ—2βˆ—1
3βˆ—2βˆ—1
0,253 βˆ— 0,751 = 4 βˆ— 0,015625 βˆ—
P(x=4) =
4!
(4βˆ’4)!4!
0,254 0,750 = 0,0039
Example 2
It is known that one out of every tax returns contains errors and are
classified as faulty.
An inspector randomly selects a sample of 20 tax returns. Calculate the
probability that in the sample of 20 (i) seven are faulty; (ii) at most 2 are
faulty.
p = 0,20 q = 0,80.
P(x=7) =
20!
(20βˆ’7)!7!
0,207 0,8013 =
20!
13!7!
0,207 0,8013 =
390700800
5040
βˆ—
0,0000064 βˆ— 0,0687194 = 0,034
Probability that at most two are faulty means P(x=0) + P(x=1) + P (x=2).
P(x=0) = 0,8020= 0,0115
20!
P(x=1) =
0,20 0,8019 = 20 βˆ— 0,20 βˆ— 0,01441 = 0,0576
P(x=2) =
(20βˆ’1)!1!
20!
(20βˆ’2)!2!
0,202 0,8018 =
20βˆ—19
2
0,202 8018 = 190 βˆ— 0,04 βˆ—
0,01801 = 0,1369
Discrete cumulative probabilities distributions and applications
In Lesson 1 a cumulative frequency was the sum of all frequencies up to
and including the frequency for a given interval, say interval r:
βˆ‘π‘Ÿπ‘–=1 𝑓𝑖 .
Similarly, a cumulative probability is the sum of all the probabilities up to
and including the probability P(x=r) of r successes. The cumulative
probability is written as
𝑃(π‘₯ ≀ π‘Ÿ) = βˆ‘π‘₯=π‘Ÿ
π‘₯=0 𝑃(π‘₯).
A cumulative probability distribution is a list of every outcome x with the
corresponding cumulative probability 𝑃(π‘₯) ≀ π‘Ÿ, r =0,1,2,…n.
Example
p=0,20 n= 5
r
0
1
2
3
4
5
P(x=r)
0,3277
0,4096
0,2048
0,0512
0,0064
0,0003
P(x≀r)
0,3277
0,7373
0,9421
0,9933
0,9997
1,0000
The Poisson probability distribution
Binomial probability calculations require a finite sample size. There are
many situations where a sample size does not feature. For example
sample size does not feature in the calculation of the probability of x
emergency calls per hour or x faults per km. of cable, etc. In situation
such as these (subjected to given assumptions outlined below), a formula
called the β€œPoisson probability formula” will be used to calculate the
probability of x occurrences of an event over a given interval of time or
length (area, volume, etc…).
Assumptions
The Poisson formula is 𝑃(π‘₯ ) =
πœ†π‘₯ 𝑒 βˆ’πœ†
π‘₯!
.
This β€œPoisson probability formula” will be used to calculate probabilities
in situations where a rare, random event occurs at a uniform rate.
Example
An ambulance service receives an average of four calls in 1 hour during
β€œoff peak” hours. Calculate that during β€œoff peaks hours” there are (a)
three calls in one hour; (b) at most two calls in 1 hour; (c) one call in 30
minutes.
Assumptions
1. The average rate at which an event occurs per interval is uniform.
For the ambulance service the average rate was given as π›Œ = four
calls in 1 hour. Since the rate was uniform, the rate may be halved
to give an average of two calls in 30 minutes; doubled to give an
average of eigth calls in 2 hours, etc.
2. The number of calls is not influenced by what happened in the
previous interval. So if there were no calls in 1 hour, this has no
effect on the chance of x calls in the next hour. This property is
often quoted as β€œthe Poisson process has no memory”.
3. There is practically no chance of more than one call arriving in a very
short time. For example, the probability of more than one call in a
minute is 0,0645 and in a second is 0,0110. As the interval becomes
smaller, the probability of more than one call approaches zero.
The Poisson probability distribution function is given by the formula
𝑃(π‘₯ ) =
πœ†π‘₯ 𝑒 βˆ’πœ†
π‘₯!
We must know π›Œ.
Example
(a) Three calls in 1 hour;
(b)
at most two calls in an hour
(c) one call in 30 minutes
(d)
more of one call in 1 minute; 1 second.
(a) x= 3, π›Œ= 4 in 1 hour
𝑃(π‘₯ = 3) =
(b)
43 𝑒 βˆ’4
3!
=
64βˆ—(0,01832)
6
= 0,19549
”at most two” means β€œ0 OR 1 OR 2”
P(at most two) =
40 𝑒 βˆ’4
0!
+
41 𝑒 βˆ’4
1!
+
42 𝑒 βˆ’4
2!
= 𝑒 βˆ’4 (1 + 4 + 8) = 0,2381
c) Since the rate is uniform, for 30 minutes π›Œ =2. Hence the probability
of one call in 30 minutes is
21 𝑒 βˆ’2
𝑃(π‘₯ = 1) =
= 0,2706
2!
d) The probability of more than one call;) P(xβ‰₯1)= 1 – P(x=0).
400
For 1 minute π›Œ= (4/60) hence P(xβ‰₯1) = 1 –
0,0645.
(60
0!
𝑒
For 1 second π›Œ = (4/60x60) hence P(xβ‰₯1) = 1 -(
40
βˆ’60
) = 1 βˆ’ 0,9355 =
βˆ’4
4 0
3600
𝑒
3600
0!
)=1βˆ’
0.9890 = 0,0110
Cumulative Poisson probabilities
Example
π›Œ=4
r
0
1
2
3
4
5
6
7
P(x=r)
0,0183
0,0733
0,1465
0,1954
0,1954
0,1563
0,1042
0,0595
P(x≀r)
0,0183
0,0916
0,2381
0,4335
0,6288
0,7851
0,8893
0,9489
The Normal probability distribution
The Normal curve was introduced in 1733 by De Moivre as an
approximation to certain Binomial distribution. However Gauss gave a
rigorous account of its properties in 1809.
The Normal probability is a continuos probability distribution and it is the
most important probability in statistics. It will feature in the remainder of
the text, not just in probability but also in statistical inference.
It has been long recognized that large numbers of measurements, when
sorted and plotted in a probability (relative frequencies) histogram, tend
to assume a bell-shaped form.
The equation of the Normal curve is given by:
𝑓(π‘₯ ) =
1
βˆ’
1 (π‘₯βˆ’πœ‡)2
2𝜎 2
𝑒
𝜎√2πœ‹
f(x) is called a probability density function.
ΞΌ and Οƒ are the mean and the standard deviation of the distribution.
x can take any value between mins infinity and plus infinity.
The value of ΞΌ determines the location of the curve; the value of Οƒ
determines the width of the curve.
The area under the curve is always unity, a property of any probability
distribution.
The probability that a random variable, x, has a value between x=a and
x=b is given by the area under the curve between x=a and x=b. Areas
under curves are usually calculated by integration
π‘₯=𝑏
𝑃(π‘Ž ≀ π‘₯ ≀ 𝑏) = ∫π‘₯=π‘Ž 𝑓(π‘₯ )𝑑π‘₯
Fortunately, for the Normal probability curve, it will be not necessary to
use integration to calculate areas.
The Normal curve has special characteristics which allows us to find the
area from a single set of tables.
Special properties of the Normal distribution
1. Total area under the curve is one
2. The curve is symmetrical about the mean. The area to the left of
the mean is 0,5 and the area to the right of the mean is 0,5.
3. The area under the curve between the mean and any point x
depends on the number of standard deviations between x and ΞΌ.
For example, the area between the mean and a point which is one
standard deviation (1xσ) greater (or less) than the mean is 0,3413.
The area between the mean and a point which is two standard
deviations (2xσ ) greater (or less) than the mean is 0,4772.
Use the Normal probability tables to determine areas under the Normal
curve.
Example
The time taken to complete a transaction at an ATM machine is normally
distributed with a mean of 11 minutes and a standard deviation of 3
minutes. Use Normal probability tables to calculate the probability that a
transaction will take
a) (i) 11 and 14 minutes (ii) 8 and 14 minutes;
b) (i) 11 and 17 minutes (ii) 5 and 17 minutes;
c) (i) 11 and 20 minutes (ii) 2 and 20 minutes.
Since Z = the number of standard deviations between a point x and the
mean, in this example where ΞΌ= 11 and Οƒ=3 the values of Z for part (a),
(b) and (c) are respectively +1 and -1; +2 and minus 2; +3 and -3.
The probability between Z= -1 and Z=+1 is 0,6826.
The probability between Z=-2 and Z=+2 is 0,9544.
The probability between Z=-3 and Z=+3 is 0,9972.
a) (i) since ΞΌ = 11 and Οƒ= 3, then x=14 is one standard deviation from
the mean, since Z= +1. From the Normal probability distribution the
tail area from Z= 1 is 0,1587 so the area between 11 and 14 is 0,50,1587= 0,3413.
(ii) point x=8 is three minutes, one standard deviation below the
mean ; Z= -1. Because of symmetry, areas equidistant on each side
of the mean are equal. Since the area on the right is = 0,3413, then
the area to the left is also 0,3413. The total area is 0,6826. The
probability than a transaction will take between 8 and 14 minutes is
0,6826.
b) (i) Since ΞΌ= 11 and Οƒ= 3, then x=17 is two standard deviations (six
minutes) from the mean, hence Z= 2,00. From the Normal
probability distribution the tail area from Z=2,00 is 0,0228. Hence,
the area between the mean and Z = 2,00 is =0,5 -0,0228 = 0,4772.
The probability of a transaction will take between 11 and 17
minutes is 0,4772.
(ii) By symmetry, the total area between Z=-2 and Z= 2 is 0,9544.
The probability that a transaction will take between 5 and 17
minutes is 0,9544.
c) (i) Since ΞΌ=11 and Οƒ=3, then x=20 is three standard deviations (nine
minutes) from the mean, then Z=3,00. From the Normal probability
tables the tail area from Z = 3 is 0,0013. Hence the area between the
mean and Z=3 is calculated as 0,5 – 0,0013 = 0,4987.
The probability that a transaction will take between 11 and 20
minutes is 0,4987.
(ii) By symmetry, the total area between Z=-3 and Z=3 is 0,9974. The
probability that a transaction will take between two and 20 minutes
is 0,9974.
In most problems, the Z-values must be calculated by the formula:
𝑍=
π‘₯βˆ’πœ‡
𝜎
Example.
The time taken to process an email enquiry is normally distributed
with a mean time of 500 sec. and standard deviation of 10 sec.
What is the probability that a randomly select email enquiry will be
processed in
(a) more than 505 seconds;
(b) less than 485 seconds;
(c) between 485 and 505 seconds.
a) For x= 505, 𝑍 =
π‘₯βˆ’πœ‡
𝜎
=
505βˆ’500
10
= 0,50
When Z = 0,50 the area in the tail is 0,3085, that is the required
probability.
b) For x= 485, 𝑍 =
π‘₯βˆ’πœ‡
𝜎
=
485βˆ’500
10
= βˆ’1,50
When Z is negative,look up the tables for Z=+1,5, the tail area is
0,0668.
By symmetry, the area above Z=1,5 is the same as area below Z=-1,5
= 0,0668.
c) 1- 0,3085 -0,0668 = 0,6247.
Method for calculating the limits that contains a given percentage of
values under the Normal curve.
In some situations it might be useful to know that it takes between P and
Q seconds to process, say, 95% of inquiries. Calculating the values of P
and Q is possible when the times are Normally distributed with mean and
standard deviation known.
The method for calculating the two limits, , symmetrical about the mean,
that contain a given percentage of all the data is set out as follows:
Step 1. Sketch a Normal curve marking and label the given area and tail
areas.
Step 2. From the tail areas, look up tables to find Z.
Step 3. From the value of Z, calculate the number of units (d) between x
and the mean. It will be shown that the number of units d=Zσ.
Step 4. Calculate the values of P and Q, Q= μ+Zσ, P=μ-Zσ.
Example
Within what time should 94% of the mail be processed?
Since there is 94% below x this leaves 6% in the upper tail, to find the
value of Z corresponding to a tail area of 6% you will use the Normal
probability distribution tables, Z = 1,56. Hence Q=500+1,56(10) = 515,6.
In general, if Ξ± represent the area in both tails of a Normal curve (that is
Ξ±/2 is the area in each), then (1-Ξ±) represents the area between the two
tails, then
(1-Ξ±)100% of the area under the normal curve will fall between
πœ‡ βˆ’ 𝑍𝛼/2 π‘Žπ‘›π‘‘ πœ‡ + 𝑍𝛼/2
The standard Normal distribution
Step 2 in the method for calculating Normal probabilities required the
calculation of the number of standard deviations (Z) between x and ΞΌ by
the formula
π‘₯βˆ’πœ‡
𝜎
This formula transforms (or rescales) any Normal probability distribution
that has a mean ΞΌ and a standard deviation Οƒ to the standard Normal
distribution.
The standard Normal distribution has a mean value ΞΌ = 0 and standard
deviation = 1. Hence the standard Normal tables (Z) are tables for the
standard Normal distribution.
𝑍=
Sums or differences of Normal independent random variables
The distribution of the sums and differences of two normally distributed,
independent variables (NIRV) features in many applications. The
distribution for the difference between NIRVs is fundamental to making
inference about the difference between population means and
proportion.
RULE
Suppose the random variables
X1 is Normally distributed with mean ΞΌ1 and standard deviation Οƒ1
X2 is Normally distributed with mean ΞΌ2 and standard deviation Οƒ2
Then the sum of the random variables X1 and X2 is a random variable that
is Normally distributed with mean μ= (μ1 + μ2) and variance 𝜎 2 = 𝜎12 +
𝜎22 .
Expected values (mathematical expectations)
The expected value of a random variable is defined as its mean value. It
could also be described as β€œthe value that you can expect on average”.
The formula for calculating expected values follows directly from the
formula calculating the mean value for grouped data. The expected value
is a very important mathematical tool in the theory of statistics. It is used
in many applications as the calculation of expected profits and losses and
risk analysis.
The mean value of a random variable
The expected value of the random variable X is
πœ‡ = 𝐸 (π‘₯ ) = βˆ‘ π‘₯𝑃(π‘₯)
The expected value of any function (or formula) of the random variable is
defined as
𝐸(𝑔(𝑋)) = βˆ‘ 𝑔(π‘₯ )𝑃(π‘₯)
For example E(X2) = βˆ‘ π‘₯ 2 𝑃(π‘₯).
The variance for a random variable is
𝑉 (π‘₯ ) = 𝜎 2 = 𝐸(π‘₯ βˆ’ πœ‡)2 = βˆ‘(π‘₯ βˆ’ πœ‡)2 𝑃(π‘₯)
For continuous random variables
+∞
𝐸 (π‘₯ ) = ∫ π‘₯𝑓 (π‘₯ )𝑑π‘₯
βˆ’βˆž
+∞
𝑉 (π‘₯ ) = ∫ (π‘₯ βˆ’ πœ‡)2 𝑓(π‘₯ )𝑑π‘₯
βˆ’βˆž