Download 2.4 Applications of Standard Deviation CH6 p83

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
2.4 Applications of Standard Deviation CH6 p83
Z-Score: Relative Standing
- the mean can be considered as the middle of a data set that is symmetrically distributed
- since there are many values in our data set it becomes important to determine how unusual a
data value maybe.
- a data value is considered unusual if it is a great distance away from the mean
Commonly a symmetric data distribution is expected when considering task completion times. It
is observed that clerks complete a particular task with a mean of 10 minutes and a standard
deviation of 2 minutes.
Distribution of Task Completion Times
Normal, Mean=10, StDev=2
0.20
Probability
0.15
0.10
0.05
0.00
10
12
Time to complete a task ( minutes)
15
Some clerks will take more than 10 minutes while others will require less time. But commonly it
will require about 10 minutes to complete this task. For example Clerk A takes 12 minutes while
clerk B takes 15 minutes to complete this task. The 15 minute time is more unusual since it is
further away from the mean. The 12 minutes would be 1 standard deviation to the right of the
mean while 15 is 2.5 standard deviations to the right of the mean.
Converting the 15 minutes into a z-score will provide a measure of the degree of extremeness.
z=
y−μ
σ
=
15 − 10
= 2.5
2
So the z-score is simply the count of standard deviations away from the mean. Notice that 2.5 is
very close to the right tail of the normal curve. So to observe a clerk completing the task at 3 our
4 standard deviations away from the mean is very extreme or unlikely ( but not impossible ). In
the future we will calculate the probability of clerks requiring 17 or 20 or more minutes to
complete the task.
24Normal 35 W07
1
2/19/2007
- given a symmetric data distribution where typically a clerk completes a task with a mean of 10
minutes and a standard deviation of 2 minutes.
eg.) A student scores 75 in economics and 80 in statistics. In which subject did the student do
better than the rest of the class?
μ
Economics
68.1
Statistics
75
σ
5.2
8.3
Student Standing in Economics and Statistics Classes
Normal
0.08
Mean StDev
68.1 5.2
75
8.3
0.07
Density
0.06
0.05
0.04
0.03
0.02
0.01
0.00
z=
50
60
70
80
Test Scores
90
100
y−μ
σ
75 − 68.1
ze =
= 1.3 1.3 std. dev. above the mean
5.2
80 − 75
zs =
= 0.6 0.6 std. dev. above the mean
8.3
- the student had a higher standing in economics
- a greater proportion of students in economics scored less than the student than in
statistics
24Normal 35 W07
2
2/19/2007
Eg.) Person 1 can complete job A in 10.5 minutes while person 2 can complete job B in 22.3
minutes. Which person is quicker doing his job when compared to the average?
Job A
12.6
5.2
μ
σ
Job B
27.3
3.1
Distribution Plot
Normal
0.14
Mean StDev
12.6 5.2
27.3 3.1
0.12
Density
0.10
0.08
0.06
0.04
0.02
0.00
0
10
20
30
Task completion times ( minutes)
40
z = x −s x
z 1 = 10.5 − 12.6 = −0.4
5.2
z 2 = 22.3 − 27.3 = −1.6
3.1
- the faster person would have a more negative z value
Person 2 completing Job B in 22.3 minutes is faster within his group than person A completing a
Job A in 10.5 minutes
24Normal 35 W07
3
2/19/2007
eg. ) The average income in community A is $76,000 ( std. dev. $3,100) and the average income
in community B is $69,000 ( std. dev. $4,500 ). Jones living in A has an income of
$74,500 and Grant in B has an income of $68,000. Which person is better off relative to
his community?
μ
σ
Community A
76,000
3,100
Community B
74,500
4,500
Distribution Plot
Normal
0.00014
Mean StDev
76000 3100
74500 4500
0.00012
Density
0.00010
0.00008
0.00006
0.00004
0.00002
0.00000
60000
65000
70000
75000
X
80000
85000
90000
z = x −s x
z J = 74500 − 76000 = −0.484
3100
z G = 68000 − 69000 = −0.222
4500
Grant has a higher relative standing for income in his community
Homework
read pg 83 – 97
24Normal 35 W07
pg 99: 5, 7, 15
4
2/19/2007
The 68-95-99.7 Rule (Empirical Rule )
- for an approximately bell shaped data distribution
68% of the data will lie within +/- 1 std. dev. of the mean
95% of the data will lie within +/- 2 std. dev. of the mean
99.7% of the data will lie within +/- 3 std. dev. of the mean
eg.
An employee reports that he required 26 minutes to complete a task. Historically the
average time to complete this task is 19.5 minutes ( 1.6 min. std. dev. ). Is the reported
time excessive?
9 9 .7 %
95%
68%
1 9 .5 - 3 X 1 .6
1 4 .7
1 9 .5 - 2 X 1 .6
1 6 .3
1 9 .5 - 1 .6
1 7 .9
0
1 9 .5
1 9 .5 + 1 .6 1 9 .5 + 2 X 1 .9 1 9 .5 + 3 X 1 .6
2 1 .1
2 2 .7
2 4 .3
- expected variability is normally defined by an interval of 3 std. dev. on either side of the mean
- if there are 262 employees then you should expect that at least 0.997 x 262 = 261 will
complete the task within 14.7 and 24.3 minutes
- thus a 26 minute time period appears to be excessive
24Normal 35 W07
5
2/19/2007
Eg)
The distribution of heights of adult men is approximately normal with mean 69 inches and
standard deviation 2.5 inches. Draw a normal curve on which this mean and standard
deviation are correctly located. ( Hint: Draw the curve first, locate the points where the
curvature changes, then mark the horizontal axis.)
m ean = 69 in
std. dev. = 2.5 in
69-3X2.5
61.5
Eg)
69-2X2.5
64.0
69-2.5
69
66.5
69+2.5
71.5
6.9+2X2.5
74.0
69+3 X2.5
76.5
Scores on the Wechsler Adult Intelligence Scale ( a standard “IQ test”) for the 20 to 34 age
group are approximately normally distributed with = 110 and " = 25.
A) About what percent of people in this age group have scores above 110?
the percent greater than 110 is 50% since 110 is the mean and median (middle ).
B) About what percent have scores above 160?
the percent greater than 160 is 2.5% since 160 is 2 standard deviations to the right of the
mean and 5% / 2 = 2.5%
C) In what range do the middle 95% of all the scores lie?
the middle 95% would lie within mean +/- 2 X std. Dev. 110 +/- 2 X 25 60 to 170
Homework pg 100 11, 15
24Normal 35 W07
6
2/19/2007
Normal Probability Distribution
A data distribution which is uimodal and approximately symmetric is known as Normal
- note that the shape is smooth and flowing and this is due to the continuous number nature of
the data
GENERAL OBSERVATIONS
1. the bell shaped form is the result of the symmetrical distribution of continuous data
about the mean
2. the x axis represents a measured variable
3. the y axis represents proportion or probability
4. the tails of the curve approach the x axis ( asymptotes )
Note that not all bell shaped curves are normal distributions, one test of normality is to see
whether the data meets the Empirical Rule.
- mathematics can provide a precise definition of a Normal Probability Distribution Model
- an equation can be used to plot the normal curve
P(x ) =
1
" 2
e−
1 x− 2
(
)
2 "
-complex in appearance, but most of the symbols represent constants
= 3.14159 )
( e = 2.71828,
- it is and " that controls the model
- you can construct the NPD model by knowing only and "
Distribution Plot
Normal
Mean StDev
10
1
20
3
0.4
Density
0.3
0.2
0.1
0.0
10
15
20
25
30
X
2 > 1 the mean controls the position of the curve along the x axis
" 2 > " 1 the standard deviation controls the shape
24Normal 35 W07
7
2/19/2007
- the NPD curve is a density curve
- since all possible observations can be identified under the curve
- then let the area under the curve equal one
- the proportion of observations can be determined by calculating the area under the NPD curve
eg) The management of an online company wishes to determine the proportion of customers who
spend more than 15 minutes at their WEB site. Historically it has been shown that the NPD
serves as a good model for the situation with a = 13.1 minutes and a " = 2.2 minutes.
What can management conclude?
-find the proportion of the shaded area to the whole area
std. dev. = 2.2
shaded area
32
P( x ) = ------------------- = ---- = 0.211 = 21.1%
whole area
152
-thus 21.1% of the total area is found to be greater than 15
13.1 15
minutes and this is also
the proportion of customers expected to spend more than 15 minutes at the WEB page
-the method is not practical
-we can use a table to solve the problem, but we cannot use the problem data directly,
otherwise, we would need a table for each new problem
-we must transform the x values into z values or z scores
z=
x−
"
std. dev. = 2.2
15 - 13.1
z score of 15 mins. = ----------------- = 0.863
2.2
- the z score of the mean is 0
13.1
0
-ve
15
+0.86
+ve
- z scores to the right of the mean are positive and to
the left are negative
-go to table A56 in your text book
-the standard normal probabilities table relates a z value to the area under the curve
24Normal 35 W07
8
2/19/2007
x
z
A( 0.86 ) = 0.8051
0.1949
thus the proportion of customers spending more than 15 minutes is 1 - 0.8051 =
or 19.5%
-compare to the initial calculated value
Finding the Area Under the Normal Curve
eg) Find the area under the NPD curve which lies
a) Area { z < +0.69 }
0
A( 0.69 ) = 0.7549
+ 0 .6 9
b) Area { z > -1.31 }
A( -1.31 ) = 0.0951
-1 .31 0
A = 1.0 - 0.0951 = 0.9049
c) Area { z > 2.0 }
A( 2.0 ) = 0.9772
A = 1.0 - 0.9772 = 0.0228
0
2
0
1
d) Area { -1.0 < z < +1.0 }
A( -1.0 ) = 0.1587
A( +1.0 ) = 0.8413
A = 0.8413 - 0.1587 = 0.6826
-1
e) Area { 1.11 < z < 1.71 }
A( 1.71 ) = 0.9564
A( 1.11 ) = 0.8665
A = 0.9564 - 0.8665 = 0.0899
0 1.11 1.71
24Normal 35 W07
9
2/19/2007
F) Area { -1.4 < z OR z > 2.1 }
A( -1.4 ) = 0.0808
A( 2.1 ) = 0.9821
-1 .4 0
2.1
A = 0.0808 + ( 1 - 0.9821 ) = 0.0987
eg) Find the z value that
a) leaves 0.1251 in the lower tail
A( z ) = 0.1251
z = -1.15
0.1251
0
z
b) leaves 45.62% in the upper tail
0.4562
A( z ) = 1 - 0.4562 = 0.5438
z = +0.11
0
z
c) the area to the right of z is 0.0044
A( z ) = 1.0 - 0.0044 = 0.9956
0.0044
z = 2.62
0
d) leaves 60% of the observations fall above it
z
A( z ) = 1 - 0.600 = 0.4000
0 .6 0 0 0
0.25 : 0.4013 - 0,4000 = 0.0013
z
0
0.26: 0.4000 - 0.3974 = 0.0026
E) the z values that symmetrically distribute 10% in the two tails
A( z ) = 0.05;
z = -1.65
Due to symmetry the upper tail z = 1.65
0.05
0.05
z
Homework pg 101 19, 21
24Normal 35 W07
10
2/19/2007
0
z
Applied Normal Distribution Problems
eg) An IQ test is given to 750 students and the results show = 100 and "= 15.
a) How many students scored between 90 and 105?
z=
90 - 100
z = -------------- = -0.67
x−
"
A( -0.67 ) = 0.2514
105 - 100
z = -------------- = +0.33
15
15
A( +0.33 ) = 0.6293
90
-0.67
x
z
100 105
0
+0.33
A = 0.6293 - 0.2514 = 0.3779
number of students 0.3779 * 750 = 283
b) Find the number of students who scored above genius ( IQ > 140 )?
140 - 100
z = --------------- = 2.67
15
A( 2.67 ) = 0.9962
P = 1.0 - 0.9962 = 0.0038
100 140
0
2.67
x
z
number of students 0.0038 * 750 = 3
c) Find the first quartile score
A( z ) = 0.25
z = -0.67
x - 100
-0.67 = ------------15
-10.05 = x - 100
x = 89.95
0.25
x 100
-0.67 0
24Normal 35 W07
11
2/19/2007
x
z
eg) The length of trout caught in Muskoka Lake is normally distributed with = 21 cm and
"= 6 cm. By law trout less than 17 cm must be returned.
a) What proportion of your catch must be returned?
17 - 21
z = ----------- = -0.667
6
A( -0.67 ) = 0.2514
17 21
-.67 0
x
z
P( <17 cm ) = 0.2514
b) What is the chance of catching a trout between 25 and 35 cm
25 - 21
z = ----------- = +0.667
6
A( 0.67 ) = 0.7486
35 - 21
z = ----------- = 2.333
6
A( 2.33 ) = 0.9901
6
21 25 35 x
0 .67 2.33 z
P = 0.9901 - 0.7486 = 0.2415
eg) The design tolerances of a sheet of glass are 36.50 mm and 36.60 mm. If the mean is 36.56
mm and the std. dev. is 0.02 mm, then how many from a shipment of 1000 sheets will be
rejected.
z=
x−
"
36.50 - 36.56
z = -------------------- = -3.00
0.02
36.60 - 36.56
z = -------------------- = 2.00
0.02
A( -3.00 ) = 0.0013
0.02
36.50
-3.0
36.56
0
A( 2.00 ) = 0.9772
A = 1 - ( 0.9772 - 0.0013 ) = 0.0241
number of sheets = 1000 * 0.0241 = 24
24Normal 35 W07
12
2/19/2007
36.60 x
2.0 z
eg)
A large tire dealer promised to refund the total purchase price of snow tires to November
customers if the total snowfall for the winter was less than 40 cm. Climatological records
for the city reveal that the total snowfall averages 120 cm with a standard deviation of 35
cm. Calculate the probability that the dealer will have to
repay his customers.
40 - 120
z = ------------ = -2.29
35
0.011
40
-2.29
120
0
x
z
A( -2.29 ) = 0.011
P = 0.011
there is a 1% chance
eg)
The average weight of soap in 1kg boxes is 1.02 kg ( "= 0.018 kg ).
a) What percent of the boxes weigh less than 1.00 kg?
1.00 - 1.02
z = -------------- = -1.11
0.018
13.4%
x
z
1 1.02
-1.11 0
A( -1.11 ) = 0.1335
P = 0.1335 = 13.4%
b) Regulation requires that no more than 5% of the boxes are to weigh less than 1 kg.
What should be the new mean setting?
A( z ) = 0.05
z = -1.65
1.0 - -1.65 = -------------0.018
-0.0297 = 1.0 - 24Normal 35 W07
1
mean
-
= 1.03 kg
13
2/19/2007
x
z
eg)
The manager of a pizza business observes that the average delivery time is 24.4 min ( " =
3.7 min ). What is the shortest time period in which 99%
of the deliveries can be made?
A( z ) = 0.99 z = 2.33
x - 24.4
2.33 = -----------3.7
8.62 = x - 24.4
eg)
1
-
24.4
0
x
x
z
2.33
x = 33 min
What affect does the variability in delivery times have on the guaranteed delivery time?
Thirty percent of the time, a task will require at most 9.6 minutes to complete a task. Find
the average time to complete the task, if " = 2.8 minutes.
A( z ) = 0.3
z = -0.52
0.30
9.6 - -0.52 = ---------2.8
-1.456 = 9.6 - 9.6 mean
-0.52 0
= 11.1 min
Homework pg 102: 25, 27, 29, 31
24Normal 35 W07
14
2/19/2007
x
z
Bonus Question
It has been observed that people spend on average 15.3 minutes ( " =2.6 minutes ) visiting a
bank WEB page.
A) What proportion of the visitors will spend at most 12 minutes?
z=
x−
"
= 12 − 15.3 = −1.27
2.6
x
z
12.3 15.3
-1.27 0
A(−1.27) = 0.1020
B) What is the shortest time period in which 30% ( third decile ) of the visits are completed?
A(z) = 0.300
z = −0.52
− 0.52 = x − 15.3
2.6
x 15.3
-0.52 0
− 1.352 = x − 15.3
x = 13.9 minutes
Air Canada claims that using the Ckeck In Kiosk takes less than 30 seconds.
24Normal 35 W07
15
2/19/2007
x
z
Extra Problems
Eg)
An important measure of the performance of a locomotive is its “adhesion”, which is the
locomotive’s pulling force as a multiple of its weight. The adhesion of one 4400
horsepower diesel locomotive model varies in actual use according to a normal distribution
with = 0.37and standard deviation " = 0.04.
A)
What proportion of adhesions measured in use are higher than 0.40?
x−
"
= 0.4 − 0.37 = 0.75
0.04
A(0.75) = 0.7734
z=
A = 1 − 0.7734 = 0.2266
0.37 0.4
0
0.75
x
z
22.7% of adhesions measured in use are greater than 0.40.
B)
What proportion of adhesions are between 0.40 and 0.50?
x−
"
z 1 = 0.4 − 0.37 = 0.75 A(0.75) = 0.7734
0.04
z 2 = 0.5 − 0.37 = 3.25 A(3.25) = 0.9994
0.04
A = 0.9994 − 0.7734 = 0.226
z=
0.37 0.4 0.5 x
0 0.75 3.25z
22.6% of the adhesions fall between 0.4 and 0.5
C)
Improvements in the locomotive’s computer control’s change the distribution of adhesion
to a normal distribution with = 0.37 and " = 0.02. Find the proportions in (a) and (b)
after this improvement.
x−
"
= 0.4 − 0.41 = −0.5
0.02
A(−0.5) = 0.3085
z=
A = 1 − 0.3085 = 0.6915
24Normal 35 W07
16
2/19/2007
69.2% of adhesions measured in use are greater than
0.40.
0.4 0.41
-0.5 0
x−
z= "
z 1 = 0.4 − 0.41 = −0.5 A(−0.5) = 0.3085
0.02
z 2 = 0.5 − 0.41 = 4.5
0.02
x
z
- note that z = 4.5 is beyond the table which stops at z = 3.09
- it is possible to have z values beyond the table and since
the accumulated area at the end of the table is very very
close to 1 the accumulated area beyond the table can be
taken as 1 or at least 0.9998
0.4 0.41
-0.5 0
x 0.5
z 4.5
A(4.5) = 1
A = 1 − 0.3085 = 0.6915
69.2% of adhesions measured in use are between 0.4 and 0.5
Eg)
The annual rate of return on stock indexes (which combine many individual stocks ) is
approximately normal. Since 1945, the Standard & Poor’s 500 index has had a mean
yearly return of 12%, with a standard deviation of 16.5%. Take this normal distribution to
be the distribution of yearly returns over a long period.
A)
In what range do the middle 95% of all early returns lie?
95 %
-2 stdev 12
2 stdev
the 68-95-99.7 Rule specifies that the interval which will capture the middle 95% is found
at the mean ! 2(std. dev. ) = 12% ! 2(16.5% )
24Normal 35 W07
17
− 21% w 45%
2/19/2007
B)
The market is down for the year if the return on the index is less than zero. In what
proportion of year’s is the market down?
z = 0 − 12 = −0.727
16.5
A(−0.73) = 0.2327
Since 1945 the market has been down for 23.7% of the
years
x
z
0 12
-0.73 0
C)
In what proration of years does the index gain 25% or more?
z = 25 − 12 = 0.788
16.5
A(0.79) = 0.7852
A = 1 − 0.7852 = 0.2121
12
0
25
0.79
Since 1945 the index has gained more than 25% for 21.2% of the years
24Normal 35 W07
18
2/19/2007
x
z
Eg)
The length of human pregnancies from conception to birth varies according to a
distribution that is approximately normal with mean 266 days and standard deviation 16
days.
A)
What percent of pregnancies last less than 240 days ( that’s about 8 months )?
x−
"
= 240 − 266 = −1.625
16
A(−1.63) = 0.0516
z=
240 266
-1.63 0
x
z
5.2% of the pregnancies last less than 240 days
B)
What percent of the pregnancies last between 240 and 270 days ( roughly between 8 and
9 months)?
x−
"
z 1 = 240 − 266 = −1.63 A(1.63) = 0.0516
16
z 2 = 270 − 266 = 0.25 A(0.25) = 0.5987
16
A = 0.5987 − 0.0516 = 0.5471
z=
240 266 270
-1.63 0 0.25
x
z
5.5% of the pregnancies last between 240 and 270 days
C) How long do the longest 20% of the pregnancies last?
A(z) = 1 − 0.2 = 0.8 z = 0.84
x−
z= "
0.84 = x − 266
16
13.44 = x − 266
266 x
0
.84
x = 279.44
The longest 20% of the pregnancies would require more than 279 days
24Normal 35 W07
19
2/19/2007
x
z