• Study Resource
• Explore

Survey
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```9.1
Measures of Central Tendency
Slide 13 - 1
Definitions
• An average is a number that is representative
of a group of data.
• The arithmetic mean, or simply the mean is
symbolized by x , when it is a sample of a
population or by the Greek letter mu, , when
it is the entire population.
Slide 13 - 2
Mean
• The mean, is the sum of the data divided by
the number of pieces of data. The formula for
calculating the mean is
Sx
x
n
• where Sx represents the sum of all the data
and n represents the number of pieces of
data.
Slide 13 - 3
Example-find the mean
• Find the mean amount of money parents
spent on new school supplies and clothes if 5
parents randomly surveyed replied as follows:
\$327 \$465 \$672 \$150 \$230
Slide 13 - 4
Solution
327  465  672  150  230 1844
x

 368.8
5
5
Slide 13 - 5
Median
• The median is the value in the middle of a set
of ranked data.
• Example: Determine the median of
\$327 \$465 \$672 \$150 \$230.
middle value
(median)
Slide 13 - 6
Solution
Rank the data from smallest to largest.
\$150 \$230 \$327 \$465 \$672
middle value
(median)
Slide 13 - 7
Example: Median (even data)
• Determine the median of the following set of
data: 8, 15, 9, 3, 4, 7, 11, 12, 6, 4.
Slide 13 - 8
Solution
Rank the data:
3 4 4 6 7 8 9 11 12 15
There are 10 pieces of data so the median will
lie halfway between the two middle pieces the 7
and 8.
The median is (7 + 8)/2 = 7.5
3 4 4 6 7 8 9 11 12 15
(median) middle value
Slide 13 - 9
Mode
• The mode is the piece of data that occurs
most frequently.
• Example: Determine the mode of the data set:
3, 4, 4, 6, 7, 8, 9, 11, 12, 15.
Slide 13 - 10
Solution
• The mode is 4 since it occurs twice and the
other values only occur once.
3, 4, 4, 6, 7, 8, 9, 11, 12, 15.
Slide 13 - 11
Midrange
• The midrange is the value halfway between
the lowest (L) and highest (H) values in a set of
data.
lowest value + highest value
Midrange 
2
Slide 13 - 12
Example
• Find the midrange of the data set \$327, \$465,
\$672, \$150, \$230.
150 + 672 822
Midrange 

 411
2
2
Slide 13 - 13
Example
• The weights of eight Labrador retrievers
rounded to the nearest pound are 85, 92, 88,
75, 94, 88, 84, and 101. Determine the
a) mean
b) median
c) mode
d) midrange
e) rank the measures of central tendency
from lowest to highest.
Slide 13 - 14
Example--dog weights 85, 92, 88, 75,
94, 88, 84, 101
a. Mean
85  92  88  75  94  88  84  101 707
x

 88.375
8
8
b. Median-rank the data
75, 84, 85, 88, 88, 92, 94, 101
The median is 88.
Slide 13 - 15
Example--dog weights 85, 92, 88, 75,
94, 88, 84, 101
c. Mode-the number that occurs most
frequently. The mode is 88.
d. Midrange = (L + H)/2
= (75 + 101)/2 = 88
e. Rank the measures, lowest to highest
88, 88, 88, 88.375
Slide 13 - 16
Measures of Position
• Measures of position are often used to make
comparisons.
• Two measures of position are percentiles and
quartiles.
Slide 13 - 17
To Find the Quartiles of a Set of Data
1. Order the data from smallest to largest.
2. Find the median, or 2nd quartile, of the set of
data. If there are an odd number of pieces of
data, the median is the middle value. If there
are an even number of pieces of data, the
median will be halfway between the two
middle pieces of data.
Slide 13 - 18
To Find the Quartiles of a Set of Data
continued
3. The first quartile, Q1, is the median of the
lower half of the data; that is, Q1, is the
median of the data less than Q2.
4. The third quartile, Q3, is the median of the
upper half of the data; that is, Q3 is the
median of the data greater than Q2.
Slide 13 - 19
Example: Quartiles
• The weekly grocery bills for 23 families are as
follows. Determine Q1, Q2, and Q3.
170
330
225
75
95
210
80
225
160
172
270
170
215
130
190
270
240
310
74
280
270
50
81
Slide 13 - 20
Example: Quartiles continued
• Order the data:
50 75 74 80 81 95 130
160 170 170 172 190 210 215
225 225 240 270 270 270 280
310 330
Slide 13 - 21
Example: Quartiles continued
Q2 is the median of the entire data set which
is 190.
Q1 is the median of the numbers from 50 to
172 which is 95.
Q3 is the median of the numbers from 210 to
330 which is 270.
Slide 13 - 22
9.2
Measures of Dispersion
Slide 13 - 23
Measures of Dispersion
• Measures of dispersion are used to indicate
the spread of the data.
• The range is the difference between the
highest and lowest values; it indicates the
total spread of the data.
Range = highest value – lowest value
Slide 13 - 24
Example: Range
• Nine different employees were selected and
the amount of their salary was recorded. Find
the range of the salaries.
\$24,000 \$32,000 \$26,500
\$56,000 \$48,000 \$27,000
\$28,500 \$34,500 \$56,750
•
Slide 13 - 25
Solution
• Range = \$56,750  \$24,000 = \$32,750
Slide 13 - 26
Standard Deviation
• The standard deviation measures how much
the data differ from the mean. It is symbolized
with s when it is calculated for a sample, and
with  (Greek letter sigma) when it is
calculated for a population.
s

S xx

2
n 1
Slide 13 - 27
To Find the Standard Deviation of a Set
of Data
1. Find the mean of the set of data.
2. Make a chart having three columns:
Data
Data  Mean
(Data  Mean)2
3. List the data vertically under the column
marked Data.
4. Subtract the mean from each piece of data
and place the difference in the Data  Mean
column.
Slide 13 - 28
To Find the Standard Deviation of a Set
of Data continued
5. Square the values obtained in the Data 
Mean column and record these values in the
(Data  Mean)2 column.
6. Determine the sum of the values in the
(Data  Mean)2 column.
7. Divide the sum obtained in step 6 by n  1,
where n is the number of pieces of data.
8. Determine the square root of the number
obtained in step 7. This number is the
standard deviation of the set of data.
Slide 13 - 29
Example
• Find the standard deviation of the following
prices of selected washing machines:
\$280, \$217, \$665, \$684, \$939, \$299
Find the mean.
280  217  665  684  939  299 3084
x

 514
6
6
Slide 13 - 30
Example continued, mean = 514
Data
217
280
299
665
684
939
Data  Mean
297
234
215
151
170
425
0
(Data  Mean)2
(297)2 = 88,209
54,756
46,225
22,801
28,900
180,625
421,516
Slide 13 - 31
Example continued, mean = 514
s

S xx

2
n 1
421,516

 84303.2  290.35
5
• The standard deviation is \$290.35.
Slide 13 - 32
9.3
The Normal Curve
Slide 13 - 33
Types of Distributions
• Rectangular Distribution
• J-shaped distribution
Slide 13 - 34
Types of Distributions continued
• Bimodal
• Skewed to right
Slide 13 - 35
Types of Distributions continued
• Skewed to left
• Normal
Slide 13 - 36
Properties of a Normal Distribution
• The graph of a normal distribution is called
the normal curve.
• The normal curve is bell shaped and
symmetric about the mean.
• In a normal distribution, the mean, median,
and mode all have the same value and all
occur at the center of the distribution.
Slide 13 - 37
Empirical Rule
• Approximately 68% of all the data lie within
one standard deviation of the mean (in both
directions).
• Approximately 95% of all the data lie within
two standard deviations of the mean (in both
directions).
• Approximately 99.7% of all the data lie within
three standard deviations of the mean (in
both directions).
Slide 13 - 38
z-Scores
• z-scores determine how far, in terms of
standard deviations, a given score is from the
mean of the distribution.
value of the piece of data - mean x  
z

standard deviation
s
Slide 13 - 39
Example: z-scores
• A normal distribution has a mean of 50 and a
standard deviation of 5. Find z-scores for the
following values.
• a) 55
b) 60
c) 43
55  50 5
• a) z 
 1
5
5
A score of 55 is one standard deviation
above the mean.
Slide 13 - 40
Example: z-scores continued
60  50 10
• b) z 

2
5
5
A score of 60 is 2 standard deviations above
the mean.
43  50 7

 1.4
• c) z 
5
5
A score of 43 is 1.4 standard deviations below
the mean.
Slide 13 - 41
To Find the Percent of Data Between
any Two Values
1.
2.
3.
Draw a diagram of the normal curve,
indicating the area or percent to be
determined.
Use the formula to convert the given
values to z-scores. Indicate these zscores on the diagram.
Look up the percent that corresponds to
each z-score on page 387-388.
Slide 13 - 42
To Find the Percent of Data Between
any Two Values continued
4.
a) When finding the percent of data between two zscores on opposite sides of the mean (when one
z-score is positive and the other is negative), you
find the sum of the individual percents.
b) When finding the percent of data between two zscores on the same side of the mean (when both
z-scores are positive or both are negative),
subtract the smaller percent from the larger
percent.
Slide 13 - 43
To Find the Percent of Data Between
any Two Values continued
c) When finding the percent of data to the right of a
positive z-score or to the left of a negative z-score,
subtract the percent of data between 0 and z from
50%.
d) When finding the percent of data to the left of a
positive z-score or to the right of a negative zscore, add the percent of data between 0 and z to
50%.
Slide 13 - 44
Example
Assume that the waiting times for customers at
a popular restaurant before being seated for
lunch are normally distributed with a mean of
12 minutes and a standard deviation of 3 min.
a) Find the percent of customers who wait for at
least 12 minutes before being seated.
b) Find the percent of customers who wait between
9 and 18 minutes before being seated.
c) Find the percent of customers who wait at least
17 minutes before being seated.
d) Find the percent of customers who wait less than
8 minutes before being seated.
Slide 13 - 45
Solution
a. wait for at least 12
minutes
Since 12 minutes is the
mean, half, or 50% of
customers wait at least 12
min before being seated.
b. between 9 and 18
minutes
9  12 3
z

 1
3
3
18  12 6
z
 2
3
3
.9772-.1587
=.8185=81.85%
Slide 13 - 46
Solution continued
c. at least 17 min
d. less than 8 min
Slide 13 - 47
9.4
Linear Correlation and Regression
Slide 13 - 48
Linear Correlation
•
Linear correlation is used to determine
whether there is a relationship between two
quantities and, if so, how strong the
relationship is.
Slide 13 - 49
Linear Correlation
– The linear correlation coefficient, r, is a unitless
measure that describes the strength of the
linear relationship between two variables.
• If the value is positive, as one variable
increases, the other increases.
• If the value is negative, as one variable
increases, the other decreases.
• The variable, r, will always be a value between
–1 and 1 inclusive.
Slide 13 - 50
Scatter Diagrams
• A visual aid used with correlation is the
scatter diagram, a plot of points (bivariate
data).
– The independent variable, x, generally is a
quantity that can be controlled.
– The dependent variable, y, is the other variable.
• The value of r is a measure of how far a set of
points varies from a straight line.
– The greater the spread, the weaker the
correlation and the closer the r value is to 0.
– The smaller the spread, the stronger the
correlation and the closer the r value is to 1.
Slide 13 - 51
Correlation
Slide 13 - 52
Correlation
Slide 13 - 53
Linear Correlation Coefficient
• The formula to calculate the correlation
coefficient (r) is as follows:
Slide 13 - 54
Example: Words Per Minute versus
Mistakes
There are five applicants applying for a job as a medical
transcriptionist. The following shows the results of the
applicants when asked to type a chart. Determine the
correlation coefficient between the words per minute
typed and the number of mistakes.
Applicant
Ellen
George
Phillip
Kendra
Nancy
Words per Minute
24
67
53
41
34
Mistakes
8
11
12
10
9
Slide 13 - 55
Solution
• We will call the words typed per minute, x,
and the mistakes, y.
• List the values of x and y and calculate the
necessary sums.
WPM
Mistakes
x
y
24
8
67
11
53
12
41
10
34
9
Sx = 219 Sy = 50
x2
576
4489
2809
1681
1156
y2
xy
64
192
121
737
144
636
100
410
81
306
Sx2 =10,711 Sy2 = 510 Sxy = 2,281
Slide 13 - 56
Solution continued
• The n in the formula represents the number of
pieces of data. Here n = 5.
Slide 13 - 57
Solution continued
Slide 13 - 58
Solution continued
• Since 0.86 is fairly close to 1, there is a fairly
strong positive correlation.
• This result implies that the more words typed
per minute, the more mistakes made.
Slide 13 - 59
Linear Regression
• Linear regression is the process of determining
the linear relationship between two variables.
• The line of best fit (regression line or the least
squares line) is the line such that the sum of
the squares of the vertical distances from the
line to the data points (on a scatter diagram) is
a minimum.
Slide 13 - 60
The Line of Best Fit
• Equation:
Slide 13 - 61
Example
•
•
Use the data in the previous example to find
the equation of the line that relates the
number of words per minute and the
number of mistakes made while typing a
chart.
Graph the equation of the line of best fit on
a scatter diagram that illustrates the set of
bivariate points.
Slide 13 - 62
Solution
• From the previous results, we know that
Slide 13 - 63
Solution
• Now we find the y-intercept, b.
Therefore the line of best fit is y = 0.081x + 6.452
Slide 13 - 64
Solution continued
• To graph y = 0.081x + 6.452, plot at least two
points and draw the graph.
x
10
20
30