Download standard deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
12.3 – Measures of Dispersion
Dispersion is another analytical method to study data.
A main use of dispersion is to compare the amounts of spread
in two (or more) data sets.
A common technique in inferential statistics is to draw
comparisons between populations by analyzing samples that
come from those populations.
Two of the most common measures of dispersion are the range
and the standard deviation.
Range
For any set of data, the range of the set is given by the
following formula:
Range = (greatest value in set) – (least value in set).
12.3 – Measures of Dispersion
Range
Example:
The two sets below have the same mean and median (7). Find
the range of each set.
Set A
1 2
7
12 13
Set B
5 6
7
8
Range of Set A:
13 – 1 = 12
Range of Set A:
9–5= 4
9
12.3 – Measures of Dispersion
Standard Deviation
One of the most useful measures of dispersion is the standard
deviation.
It is based on deviations from the mean of the data.
Find the deviations from the mean for all data values of the
sample 1, 2, 8, 11, 13.
The mean is 7.
To find each deviation, subtract the mean from each data
value.
Data Value 1 2 8 11 13
Deviation – 6 – 5 1
4 6
The sum of the deviations is always equal to zero.
12.3 – Measures of Dispersion
Standard Deviation
Calculating the Sample Standard Deviation
s
2
(
x

x
)

n 1
.
The sample standard deviation is found by calculating the
square root of the variance.
The variance is found by summing the squares of the
deviations and dividing that sum by n – 1 (since it is a sample
instead of a population).
The sample standard deviation is denoted by the letter s.
The standard deviation of a population is denoted by .
12.3 – Measures of Dispersion
Standard Deviation
Calculating the Sample Standard Deviation
1. Calculate the mean of the numbers.
2. Find the deviations from the mean.
3. Square each deviation.
4. Sum the squared deviations.
5. Divide the sum in Step 4 by n – 1.
6. Take the square root of the quotient in Step 5.
12.3 – Measures of Dispersion
Standard Deviation
Calculating the Sample Standard Deviation
Example:
Find the standard deviation of the sample set {1, 2, 8, 11, 13}.
=7
Data Value
Deviation
(Deviation)2
1
–6
2
–5
8
1
11
4
13
6
36
25
1
16
36
Sum of the (Deviations)2 = 36 + 25 + 1 + 16 + 36 = 114
12.3 – Measures of Dispersion
Standard Deviation
Calculating the Sample Standard Deviation
Sum of the (Deviations)2 = 36 + 25 + 1 + 16 + 36 = 114
Divide 114 by n – 1 with n = 5:
114
= 28.5
5–1
Take the square root of 28.5:
5.34
The sample standard deviation of the data is 5.34.
12.3 – Measures of Dispersion
Standard Deviation
Example: Interpreting Measures
Two companies, A and B, sell small packs of sugar for coffee.
The mean and standard deviation for samples from each
company are given below. Which company consistently
provides more sugar in their packs? Which company fills its
packs more consistently?
Company A
Company B
xA  1.013 tsp
xB  1.007 tsp
s A  .0021
sB  .0018
12.3 – Measures of Dispersion
Standard Deviation
Example: Interpreting Measures
Company A
Company B
xA  1.013 tsp
xB  1.007 tsp
s A  .0021
sB  .0018
Which company consistently provides more sugar in their
packs?
The sample mean for Company A is greater than the sample
mean of Company B.
The inference can be made that Company A provides more
sugar in their packs.
12.3 – Measures of Dispersion
Standard Deviation
Example: Interpreting Measures
Company A
Company B
xA  1.013 tsp
xB  1.007 tsp
s A  .0021
sB  .0018
Which company fills its packs more consistently?
The standard deviation for Company B is less than the
standard deviation for Company A.
The inference can be made that Company B fills their packs
more closer to their mean than Company A.
12.3 – Measures of Dispersion
Chebyshev’s Theorem
For any set of numbers, regardless of how they are distributed,
the fraction of them that lie within k standard deviations of
their mean (where k > 1) is at least
1
1 2
k .
What is the minimum percentage of the items in a data set
which lie within 2, and 3 standard deviations of the mean?
75%
88.9%
12.3 – Measures of Dispersion
Coefficient of Variation
The coefficient of variation expresses the standard deviation as
a percentage of the mean.
It is not strictly a measure of dispersion as it combines central
tendency and dispersion.
For any set of data, the coefficient of variation is given by
s
V  100 for a sample or
x

V  100 for a population.

12.3 – Measures of Dispersion
Coefficient of Variation
Example: Comparing Samples
Compare the dispersions in the two samples A and B.
A: 12, 13, 16, 18, 18, 20
B: 125, 131, 144, 158, 168, 193
Sample A
xA  16.167
s A  3.125
VA  19.3
Sample B
xB  153.167
sB  25.294
VB  16.5
Sample B has a larger dispersion than sample A, but sample A
has the larger relative dispersion (coefficient of variation).
12.4 – Measures of Position
In some cases, the analysis of certain individual items in the
data set is of more interest rather than the entire set.
It is necessary at times, to be able to measure how an item fits
into the data, how it compares to other items of the data, or
even how it compares to another item in another data set.
Measures of position are several common ways of creating
such comparisons.
12.4 – Measures of Position
The z-Score
The z-score measures how many standard deviations a single
data item is from the mean.
xx
z
.
s
12.4 – Measures of Position
Example: Comparing with z-Scores
Two students, who take different history classes, had exams on
the same day. Jen’s score was 83 while Joy’s score was 78.
Which student did relatively better, given the class data shown
below?
Jen Joy
Class mean
78
70
Class standard deviation
4
5
12.4 – Measures of Position
Example: Comparing with z-Scores
Class mean
Class standard deviation
Jen
83
78
Joy
78
70
4
5
Jen’s z-score:
Joy’s z-score:
83 – 78
78 – 70
= 1.25
= 1.6
5
4
Joy’s z-score is higher as she was positioned relatively higher
within her class than Jen was within her class.
12.4 – Measures of Position
Percentiles
A percentile measure the position of a single data item based
on the percentage of data items below that single data item.
Standardized tests taken by larger numbers of students,
convert raw scores to a percentile score.
If approximately n percent of the items in a distribution are
less than the number x, then x is the nth percentile of the
distribution, denoted Pn.
12.4 – Measures of Position
Percentiles
Example:
The following are test scores (out of 100) for a particular math
class.
44
56
58
62
64
64
70
72
72
72
74
74
75
78
78
79
80
82
82
84
86
87
88
90
92
95
96
96
98
100
Find the fortieth percentile.
40% = 0.4
The average of the 12th and 13th items
represents the 40th percentile (P40).
0.4(30)
12
40% of the scores were below 74.5.
12.4 – Measures of Position
Other Percentiles: Deciles and Quartiles
Deciles are the nine values (denoted D1, D2,…, D9) along the
scale that divide a data set into ten (approximately) equal
parts.
10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%
Quartiles are the three values (Q1, Q2, Q3) that divide the data
set into four (approximately) equal parts.
25%, 50%, and 75%
12.4 – Measures of Position
Other Percentiles: Deciles and Quartiles
Example: Deciles
The following are test scores (out of 100) for a particular math
class.
44
56
58
62
64
64
70
72
72
72
74
74
75
78
78
79
80
82
82
84
86
87
88
90
92
95
96
96
98
100
Find the sixth decile.
Sixth decile = 60% The average of the 18th and 19th items
th decile (D ).
represents
the
6
60% = 0.6
6
0.6(30)
60% of the scores were at or below 82.
18
12.4 – Measures of Position
Other Percentiles: Deciles and Quartiles
Quartiles
For any set of data (ranked in order from least to greatest):
The second quartile, Q2 (50%) is the median.
The first quartile, Q1 (25%) is the median of items below Q2.
The third quartile, Q3 (75%) is the median of items above Q2.
12.4 – Measures of Position
Other Percentiles: Deciles and Quartiles
Example: Quartiles
The following are test scores (out of 100) for a particular math
class.
44
56
58
62
64
64
70
72
72
72
74
74
75
78
78
79
80
82
82
84
86
87
88
90
92
95
96
96
98
100
Find the three quartiles.
Q1= 25%
The 8th item represents the 1st quartile
(Q1)
25% = 0.25
0.25(30)
25% of the scores were below 72.
7.5
12.4 – Measures of Position
Other Percentiles: Deciles and Quartiles
Example: Quartiles
The following are test scores (out of 100) for a particular math
class.
44
56
58
62
64
64
70
72
72
72
74
74
75
78
78
79
80
82
82
84
86
87
88
90
92
95
96
96
98
100
Find the three quartiles.
Q2= 50% = median The average of the 15th and 16th items
nd quartile (Q ) or the
represents
the
2
50% = 0.5
2
median
0.5(30)
50% of the scores were below 78.5.
15
12.4 – Measures of Position
Other Percentiles: Deciles and Quartiles
Example: Quartiles
The following are test scores (out of 100) for a particular math
class.
44
56
58
62
64
64
70
72
72
72
74
74
75
78
78
79
80
82
82
84
86
87
88
90
92
95
96
96
98
100
Find the three quartiles.
Q3= 75%
The 23rd item represents the 3rd quartile
(Q3)
75% = 0.75
0.75(30)
75% of the scores were below 88.
22.5
12.4 – Measures of Position
Box Plots
A box plot or a box and whisker plot is a visual display of five
statistical measures.
The five statistical measures are:
the lowest value,
the first quartile, the median, the third quartile,
the largest value.
the lowest
value
the largest
value
12.4 – Measures of Position
Box Plots
Example:
The following are test scores (out of 100) for a particular math class.
44
56
58
62
64
64
70
72
72
72
74
74
75
78
78
79
80
82
82
84
86
87
88
90
92
95
96
96
98
100
Q1= 25% = 72
Q2= 50% = median= 78.5
Lowest = 44
Q3= 75%= 88
Largest = 100
12.5 – The Normal Distribution
Discrete and Continuous Random Variables
Discrete random variable: A random variable that can take
on only certain fixed values.
The number of even values of a single die.
The number of heads in three tosses of a fair coin.
Continuous random variable: A variable whose values are
not restricted.
The diameter of a growing tree.
The height of third graders.
12.5 – The Normal Distribution
Definition and Properties of a Normal Curve
A normal curve is a symmetric, bell-shaped curve.
Any random continuous variable whose graph has this
characteristic shape is said to have a normal distribution.
On a normal curve the horizontal axis is labeled with the
mean and the specific data values of the standard deviations.
If the horizontal axis is labeled using the number of standard
deviations from the mean, rather than the specific data values,
then the curve the standard normal curve
12.5 – The Normal Distribution
Sample Statistics
Normal Curve
– 2.8
– 1.4
5.5
1.4
Standard Normal Curve
2.8
–2
–1
0
or
5.5
1
2
12.5 – The Normal Distribution
Normal Curves
B
A
S
C
0
S is standard, with mean = 0, standard deviation = 1
A has mean < 0, standard deviation = 1
B has mean = 0, standard deviation < 1
C has mean > 0, standard deviation > 1
12.5 – The Normal Distribution
Properties of Normal Curves
The graph of a normal curve is bell-shaped and symmetric
about a vertical line through its center.
The mean, median, and mode of a normal curve are all equal
and occur at the center of the distribution.
Empirical Rule: the approximate percentage of all data lying
within 1, 2, and 3 standard deviations of the mean.
within 1 standard deviation
68%
within 2 standard deviations
95%
within 3 standard deviations.
99.7%
12.5 – The Normal Distribution
Empirical Rule
68%
95%
99.7%
12.5 – The Normal Distribution
Example: Applying the Empirical Rule
A sociology class of 280 students takes an exam. The
distribution of their scores can be treated as normal. Find the
number of scores falling within 2 standard deviations of the
mean.
A total of 95% of all scores lie within 2 standard deviations of
the mean.
(.95)(280) =
266 scores
12.5 – The Normal Distribution
Normal Curve Areas
In a normal curve and a standard normal curve, the total area
under the curve is equal to 1.
The area under the curve is presented as one of the following:
Percentage (of total items that lie in an interval),
Probability (of a randomly chosen item lying in an
interval),
Area (under the normal curve along an interval).
12.5 – The Normal Distribution
A Table of Standard Normal Curve Areas
To answer questions that involve regions other than 1, 2, or 3
standard deviations, a Table of Standard Normal Curve Areas
is necessary.
The table shows the area under the curve for all values in a
normal distribution that lie between the mean and z standard
deviations from the mean.
The percentage of values within a certain range of z-scores, or
the probability of a value occurring within that range are the
more common uses of the table.
Because of the symmetry of the normal curve, the table can be
used for values above the mean or below the mean.
12.5 – The Normal Distribution
Example: Applying the Normal Curve Table
Use the table to find the percent of all scores that lie between the mean
and 1.5 standard deviations above the mean.
x
z = 1.50
Find 1.50 in the z column.
z = 1.5
The table entry is .4332
Therefore, 43.32% of all values lie between the mean and 1.5 standard
deviations above the mean.
or
There is a .4332 probability that a randomly selected value will lie
between the mean and 1.5 standard deviations above the mean.
12.5 – The Normal Distribution
Example: Applying the Normal Curve Table
Use the table to find the percent of all scores that lie between the mean
and 2.62 standard deviations below the mean.
z = –2.62
z = – 2.62
x
Find 2.62 in the z column.
The table entry is 0.4956
Therefore, 49.56% of all values lie between the mean and 2.62 standard
deviations below the mean.
or
There is a 0.4956 probability that a randomly selected value will lie
between the mean and 2.62 standard deviations below the mean.
12.5 – The Normal Distribution
Example: Applying the Normal Curve Table
Find the percent of all scores that lie between the given z-scores.
z = –1.7
x
z = – 1.7
The table entry is 0.4554
z = 2.55
The table entry is 0.4946
z = 2.55
0.4554 + 0.4946 = 0.95
Therefore, 95% of all values lie between – 1.7 and 2.55 standard
deviations.
12.5 – The Normal Distribution
Example: Applying the Normal Curve Table
Find the probability that a randomly selected value will lie between the
given z-scores.
x
z = 0.61
z = 0.61
The table entry is 0.2291
z = 2.63
The table entry is 0.4957
z = 2.63
0.4957 – 0.2291 = 0.2666
There is a 0.2666 probability that a randomly selected value will lie
between 0.61 and 2.63 standard deviations.
12.5 – The Normal Distribution
Example: Applying the Normal Curve Table
Find the probability that a randomly selected value will lie above the
given z-score.
x
z = 2.14
z = 2.14
The table entry is 0.4838
Half of the area under the curve is 0.5000
0.5000 – 0.4838 = 0.0162
There is a 0.0162 probability that a randomly selected value will lie 2.14
standard deviations.
12.5 – The Normal Distribution
Example: Applying the Normal Curve Table
The volumes of soda in bottles from a small company are distributed
normally with a mean of 12 ounces and a standard deviation .15 ounces.
If 1 bottle is randomly selected, what is the probability that it will have
more than 12.33 ounces?
z = 2.2
x
12.33
The table entry is 0.4861
Half of the area under the curve is 0.5000
0.5000 – 0.4861 = 0.0139
There is a 0.0139 probability that a
randomly selected bottle will contain more
than 12.33 ounces.
12.5 – The Normal Distribution
Example: Finding z-scores for Given Areas
Assuming a normal distribution, find the z-score meeting the condition
that 39% of the area is to the right of z.
11% = 0.11
39% = 0.39
50% of the area lies to the
right of the mean.
The areas from the Normal
Curve Table are based on the
area between the mean and
the z-score.
area between the mean and the z-score = 0.50 – 0.39 = 0.11
From the table, find the area of 0.1100 or the closest value and read the
z-score.
z-score = 0.28
12.5 – The Normal Distribution
Example: Finding z-scores for Given Areas
Assuming a normal distribution, find the z-score meeting the condition
that 76% of the area is to the left of z.
26% = 0.26
50%
0.5000
50% of the area lies to the left
of the mean.
The areas from the Normal
Curve Table are based on the
area between the mean and
the z-score.
area between the mean and the z-score = 0.76 – 0.50 = 0.26
From the table, find the area of 0.2600 or the closest value and read the
z-score.
z-score = 0.71
Related documents