Download Chapter 13 - Faculty Website Listing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 13
STATISTICS
13.1 Visual Displays of Data
13.2 Measures of Central Tendency
13.3 Measures of Dispersion
13.6 Regression and Correlation( if time permits)
13.1 Visual Displays of Data
DEFINITIONS:
The distribution of a variable provides the possible values that a variable can
take on and how often these possible values occur.
The distribution of a
variable shows the pattern of variation of the variable.
Frequency Distributions
When a data set includes many repeated items, it can be organized into a
frequency distribution, which lists the distinct data values (x) along with their
frequencies ( f ). The frequency designates the number of times the
corresponding item occurred in the data set.
Relative frequency of each distinct item is the fraction, or percentage, of the data
set represented by the item. If n denotes the total number of items, and a given
item, x, occurred f times, then the relative frequency of x is f/n.
Example 1
The 25 members of a psychology class were polled as to the number of siblings
in their individual families. Construct a frequency distribution and a relative
frequency distribution for their responses, which are shown here.
2, 3, 1, 3, 3, 5, 2, 3, 3, 1, 1, 4, 2, 4, 2, 5, 4, 3, 6, 5, 1, 6, 2, 2, 2
Solution
Let's Do It! Frequency Distributions
Quiz Scores in an Economics Class
The following data are quiz scores for the members of an economics class.
Use the given data to do the following:
(a) Construct frequency and relative frequency distribution table.
(b) Construct a histogram.
(c) Construct a frequency polygon.
Grouped Frequency Distributions
Data sets containing large numbers of items are often arranged into groups, or
classes. All data items are assigned to their appropriate classes, and then a
grouped frequency distribution can be set up and a graph displayed.
Example 2
Forty students, selected randomly in the school cafeteria on a Monday morning,
were asked to estimate the number of hours they had spent studying in the past
week (including both in-class and out-of-class time). Their responses are
recorded here.
Tabulate a grouped frequency distribution and a grouped relative frequency
distribution and construct a histogram and a frequency polygon. Use 7 classes
with uniform width of 10 and use a lower limit of 10 inches for the first class.
Solution
Let’s Do it! Heights of Baseball Players
The heights (in inches) of the 54 starting players in a baseball tournament were
as follows.
Use five classes with a uniform class width of 5 inches, and use a lower limit of
45 inches for the first class.
a) Construct grouped frequency and relative frequency distributions, in a table
similar to example above. (In each case, follow the suggested guidelines for
class limits and class width.)
(b) Construct a histogram.
(c) Construct a frequency polygon.
Stem-and-Leaf Displays
The stem-and-leaf display conveys at a glance the same pictorial impressions
that a histogram would convey without the need for constructing the drawing. It
also preserves the exact data values.
Example 3
Present the study times data of Example 2 in a stem-and-leaf display.
Solution
The tens digits, to the left of the vertical line, are the “stems,” while the
corresponding ones digits are the “leaves.” We have entered all items from the
first row of the original data, from left to right, then the items from the second
row, and so on through the fourth row.
Notice that the stem-and-leaf display of Example 3 conveys at a glance the
same pictorial impressions that a histogram would convey without the need for
constructing the drawing. It also preserves the exact data values.
Let’s Do It!
Construct the stem and leaf representation of the data.
Pie Chart
A graphical alternative to the bar graph is the circle graph, or pie chart, which
uses a circle to represent the total of all the categories and divides the circle into
sectors, or wedges (like pieces of pie), whose sizes show the relative
magnitudes of the categories.
The angle around the entire circle measures 360°. For example, a category
representing 20% of the whole should correspond to a sector whose central
angle is
20% of 360°=0.20*360=72o
Example 4
Nola Akala found that, during her first semester of college, her expenses fell into
categories as shown in Table .Present this information in a circle graph.
Solution
The central angle of the sector for food is
0.30*(360o)=108o, for Rent is 0.25*(360o)=90o.
Calculate the other four angles similarly.
A circle graph shows, at a glance, the relative
magnitudes of various categories
End of section 13.1. Start your online homework on MyMathLab.
13.2 Measures of Central Tendency
It would be desirable to have a single number to
serve as a kind of representative value for the
whole set of numbers—that is, some value
around which all the numbers in the set tend to
cluster, a kind of “middle” number or a measure of
central tendency.
Three such measures are discussed in this section.
1) Mean
2) Median
3) Mode
Mean The most common measure of central tendency. The mean of a set
of data items is found by adding up all the items and then dividing the sum
by the number of items
Example 
Mean Number of Children per Household
Suppose that the number of children in a simple random sample of 10
households is as follows: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4
(a) Calculate the sample mean number of children per household.
Solution
(a) The sample mean number of children per household is given by:
2  3  0  2  1  0  3  0  1  4 16
x 

 1.6  2
10
10
.
Let's Do it!
Last year’s annual sales for eight different flower shops are given below. Find
the mean annual sales for the eight shops.
Median
DEFINITION:
The median of a set of n observations, ordered from smallest to largest, is a
value such that half of the observations are less than or equal to that value and
half the observations are greater than or equal to that value.
If the number of observations is odd, the median is the middle observation.
If the number of observations is even, the median is any number between the
two middle observations, including either of the two middle observations.
To be consistent, we will define the median as the mean or average of the two
middle observations.
Location of the median: (n+1)/2, where n is the number of observations.
Example
The ages of twenty subjects are given as follows
32
49
37
50
39
51
40
41
41
41
42
42
43
44
45
45
45
46
47
47
Solution
Calculating (n+1)/2 we get (20+1)/2 = 10.5. So the two middle observations are
the 10th and 11th observations, namely 43 and 44. The median is the mean of
these two middle observations, (43+44)/2=43.5 years.
32
37
39
40
41
41
41
42
42
43
10th obs
49
50
51
44
45
45
45
46
47
11th obs
median = 43.5
Let's Do It! 1
Median Number of Children per Household
Find the median number of children in a household from this sample of 10
households, that is, find the median of
Number of Children:
2
3
0
1
4
0
3
(a)
Order the observations from smallest to largest:
(b)
Median = ______________
0
1
2
47
Another Measure—The Mode
DEFINITION:
The mode of a set of observations is the most frequently occurring value; it is
the value having the highest frequency among the observations.
The mode of the values: { 0, 0, 0, 0, 1, 1, 2, 2, 3, 4 }
is 0
For { 0, 0, 0, 1, 1, 2, 2, 2, 3, 4 } two modes, 0 and 2 (bimodal)
What would be the mode for { 0, 1, 2, 4, 5, 8 } ?
For {0, 0, 0, 0, 0, 1, 2, 3, 4, 4, 4, 4, 5 } ?
Example 
80
70
60
Percent
1=White,
2=Asian,
3=African-American,
4=Hispanic,
5=American Indian,
6=No category listed,
50
40
30
20
Then the mode would be the value 1.
10
0
American
Indian
No
Category
Hispanic
AfricanAmerican
Race
Asian
White
Central Tendency in Frequency Table
Weighted Mean ( mean of frequency table)
The weighted mean of a group of (weighted) items is the sum of all products of
items times weighting factors, divided by the sum of all weighting factors. The
weighted mean formula is commonly used to find the mean for a frequency
distribution. In this case, the weighting factors are the frequencies.
Median of frequency table
Example
Find the mean, median and modal salary for a small
company that pays annual salaries to its employees as
shown in the frequency distribution in the margin.
Solution
Create a third column to weigh the x’s by multiplying each
x by its frequency. Then, divide the total number of
observation which is represented by the sum of the frequencies.
Mean Salary =
838,000
 $18,622
45
To Find the median salary, we must first find it’s position
45  1
 23 . This
2
indicates that it is the 23rd observation that is Median = $18,500
To find the mode, we locate the x value with the highest frequency. That is
x=$18,500
Let’s Do It!
Find the medians for the following distributions.
End of section 13.2. Start your online homework on MyMathLab.
13.3 Measures of Dispersion
Both sets of data have the same mean, median and mode but the values
obviously differ in another respect -- the variation or spread of the values.
The values in List 1 are much more tightly clustered around the center value of
60. The values in List 2 are much more dispersed or spread out.
List 1: 55, 56, 57, 58, 59, 60, 60, 60, 61, 62, 63, 64, 65
mean = median = mode = 60
X
X
XXXXXXXXXXX
35
40
45
50
55
60
65
.
70
75
80
85
List 2: 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80, 85
mean = median = mode = 60
X
X
X
X
X
35
40
45
50
55
X
X
X
60
X
X
X
X
X
65
70
75
80
85
.
What is needed here is some measure of the dispersion, or spread, of the data.
Two of the most common measures of dispersion, the range and the standard
deviation, are discussed in this section.
Range
The range is the simplest measure of variability or spread.
Range is just the difference between the largest value and the smallest value.
Let’s Do It!
Find the range of list 1 and list 2 above. Determine which list has a larger
spread.
Standard Deviation
.…...a measure of the spread of the observations from the
mean.
.……think of the standard deviation as an “average distance of the
observations from the mean.”
Example 5.9 Standard Deviation—What Is It?
Mean:
(0+5+7)/3= 4
Deviations:
-4,
1,
3
16,
1,
9
Squared Deviations:
Standard deviation s 
s  13  3.6
(4)2  12  32

3 1
16  1  9

2
26
2
Let’s Do It!
Find the standard deviation of the sample by using the step-by-step process
7, 9, 18, 22, 27, 29, 32, 40
What is the mean of the data?
Complete the table of deviations and their squares below
Compute the standard deviation s of the data.
Central tendency and dispersion (or “spread tendency”) are
different and independent aspects of a set of data. Which one
is more critical can depend on the specific situation.
Consider a situation involves target shooting (also illustrated at
the side). The five hits on the top target are, on average, very
close to the bulls eye, but the large dispersion (spread) implies
that improvement will require much effort. On the other hand,
the bottom target exhibits a poorer average, but the smaller
dispersion means that improvement will require only a minor
adjustment of the gun sights. (In general, consistent errors can
be dealt with and corrected more easily than more dispersed
errors.)
Coefficient of Variation Look again at the top target pictured above.
The dispersion, or spread, among the five bullet holes may not be especially
impressive if the shots were fired from 100 yards, but would be much more so
at, say, 300 yards. There is another measure, the coefficient of variation, which
takes this distinction into account. It is not strictly a measure of dispersion, as it
combines central tendency and dispersion. It expresses the standard deviation
as a percentage of the mean.
Often this is a more meaningful measure than a straight measure of dispersion,
especially when comparing distributions whose means are appreciably different.
Example
Compare the dispersions in the two samples A and B.
A: 12, 13, 16, 18, 18, 20
B: 125, 131, 144, 158, 168, 193
Use you calculator to find the mean and standard deviation for the two samples
A and B.
From the calculated values, we see that the value of the sample B has a larger
standard deviation than sample A. But sample A actually has the larger relative
dispersion (coefficient of variation). The dispersion within sample A is larger as
a percentage of the sample mean.
Let’s Do it!
Two brands of car batteries, both carrying 6-year warranties, were sampled and
tested under controlled conditions. Five of each brand failed after the numbers
of months shown here.
Brand A: 75, 65, 70, 64, 71
Brand B: 69, 70, 62, 72, 60
(a) Calculate both Brands means.
Brand A mean=
Brand B mean=
(b) Calculate both brands standard deviations.
Brand A standard deviation s=
Brand B standard deviation s=
(c) Which brand battery apparently lasts longer?
(d) Calculate the coefficient of variation for both brands.
(e) Which brand battery has the more consistent lifetime? Explain why?
End of section 13.3. Start your online homework on MyMathLab.
TI Quick Steps
Obtaining Summary Measures
Step 1
Clear data.
Step 2
Enter data to be summarized.
Step 3
Obtain the summary measures for the data in L1.
Summary measures are obtained by requesting the 1-Var Stats from
under the STAT CALC menu list. The sequence of buttons is as
follows:
The 1-Var Stats are now displayed in the window. Notice that both the sample
standard deviation S x and the mean x are provided.