Download MAT112 Chapter 11 Grouped Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
MAT112 Sections 11.1 – 11.3 Grouped Data
11.1
Bar Graphs – use vertical and horizontal axes and either vertical or horizontal bars to represent
data. (Used to easily give a visual representation to compare data.)
Broken Line Graphs – constructed by connecting the midpoints of a bar graph. (Useful to
emphasize the change over time.) The line can be used to more accurately estimate future values.
Pie Chart – used to show percentages of several categories dividing a whole. The size of the angle
of the circle sector is proportional to the percentage the category is of the whole circle (360°).
Hours of Sleep in One
Night
10
10
5
5
Hours
Hours
Hours of Sleep in One
Night
0
Sydnie's Day
67%
0
Sydnie Jack Taylor Brad
Alseep
33%
Awake
Sydnie Jack Taylor Brad
Frequency Table – table used to show large groups of data and frequency distribution





Data Range –obtained by subtracting the smallest data value from the largest data value
Class Interval –equal size intervals that break up a data range into groups. To determine
what class size to use, divide the data range by the number of intervals desired. Then pick a
slightly larger number that is easy to use and avoids data values being on the border of two
different intervals. Extra decimal place values may be used also to help avoid this.
Class Frequency - the number of data values that are in the class interval
Frequency Distribution –a set of all classes listed with their frequencies (usually shown in
a frequency table)
Relative Frequency –the percent of data that falls in the class interval, written as a decimal.


Probability Distribution – the set of relative frequencies used when considering the
probability associated with a class. The probability of a data value landing in a certain class
is the relative frequency of that class.
Example: The number of calls from motorists per day for roadside service was recorded for the
month of December 2003. The results were as follows:
28
86
122
80
217
90
130
120
120
70
104
82
97
109
75
81
40
68
145
174
187
194
113
111
90
75
123
140
100
170
120
We can use this data to record how often data occurs for each given class interval. The data range is
217-28 = 189. If we want 6 classes we can divide 189 by 6: 189/6 = 31.5. Next, find a slightly larger
number to work for class size where no data value will appear in the border value of two intervals.
32 will work for class size starting at a value of 25 since the smallest value is 28.
Class Interval
Tally
Frequency
Relative Frequency
25 - 57
II
2
0.065
57 - 89
IIII III
8
0.26
89 - 121
IIII IIII I
11
0.35
121 - 153
IIII
5
0.16
153 - 185
II
2
0.065
185 - 217
III
3
0.1
Total
31
1.00
The relative frequency shows the percent of data that falls
in the class interval, written as a decimal. Notice that the
sum of all the relative frequencies is equal to 1.
The tallies record how many data
values fall within a certain class.
Histogram – similar to bar graphs. Note that there is no space in between the bars. Class
boundaries are along the x-axis and frequencies are on the y-axis. They are used to create a visual
representation for large groups of data.
Frequency Polygon – the broken line graph that is
associated with a histogram. Remember that this
uses the midpoint of each bar on the histogram.
Histogram
12
10
Frequency Polygon
6
4
Fequency
Frequency
8
2
0
25
57
89
121
153
185
15
10
5
0
9
41
73
105
137
169
201
233
217
Class Boundaries
Class Boundaries
11.2 (*Note: For Mean, Median and Mode of ungrouped data, see the handout titled Measures of
Central Tendency & Dispersion. )
Grouped Data – a data set or frequency table that does not show each individual data entry. The
data is only shown grouped into interval classes.
Mean for Grouped Data – the Greek letter (“mu”) is used as the symbol for population mean and the
symbol ̅ (“x-bar”) is used to represent the mean of a sample of data values. To determine the mean of
grouped data, take the midpoint of each class
and multiply it by its class frequency
. Then,
add all of these together and divide this by the total number of data values .
̅
∑
Example: For the data presented earlier:
Class Interval
25 - 57
57 - 89
89 - 121
121 - 153
153 - 185
185 - 217
Midpoint
41
73
105
137
169
201
Frequency
Product
82
584
1155
685
338
603
∑
3447
2
8
11
5
2
3
Totals 31 =
Mean =
Median for grouped data –the number in the middle of the data. Half of the area of the histogram
should be on each side of this value. To determine the median:
1. The area of the entire histogram must be determined. To do this, calculate the area of each
rectangle (A = l · w) and add all the areas together to get the total area.
2. Then take the total area and divide it by two. Use this number to locate which class interval
the median lies in by adding up the areas of the rectangles from left to right until it exceeds
the value.
3. Next set up an equation for the areas of the rectangles that must equal the value from step 2.
4. Finally, solve for x. Use this value to determine the median.
Example: From the previously given data:
1.
3.
Note: The width of each rectangle is
32 since that is the width of each
class interval. The length of each is
the frequency of each interval.
Median
2.
(half the area)
4.4. x = 16
16
x
First three areas total: 64 + 256
+352 = 672 which is greater than
496, so the Median is in the third
interval: 89 – 121.
32
8
11
Median = 105
32
2
25
Area needs to equal 496
2(32) + 8(32) + x(11) = 496
x = 16
57
89
121
16
89 + 16 = 105
11.3 (*Note: For Range, Variance and Standard Deviation of ungrouped data, see the handout titled
Measures of Central Tendency & Dispersion. )
Range for Frequency Distribution – difference between the upper boundary of the highest class
and the lower boundary of the smallest class
Standard Deviation for Grouped Data - the standard deviation for grouped data uses the
midpoints ( of each class and the class’s frequency
to determine the standard deviation.
Sample Standard Deviation, s:
√∑
Formulas:
Population Standard Deviation, :
̅
√∑
1. Find the mean ( ̅
)of the grouped data. (Described previously in this
handout.)
2. Take the midpoint ( of each class and subtract mean ( ̅
) from each .
3. Square each value from Step 2.
4. Multiply each number from Step 3 by each class interval’s frequency
.
5. Add all of the results from Step 4 together.
6. Divide this total by the total number of data points (n) for a population standard
deviation or total number minus one (n-1) for a sample standard deviation.
7. The standard deviation is the square root of this number
Example: Using the data given previously, find the population standard deviation.
1. The mean ( ) was approximately 111.
Midpoint of
Interval (xi)
Deviation (xi
)
Values squared
41-111= -70
(-70)2 = 4900
4900·2 = 9800
73-111= -38
105-111= -6
(-38)2
1444·8 = 11552
137
137-111= 26
(26)2
36·11 = 396
676·5 = 3380
169
201
169-111= 58
201-111 = 90
(58)2 = 3364
3364·2 = 6728
(90)2 = 8100
Total
8100·3 = 24300
56156
41
73
105
6&7
√
(-6)2
= 1444
= 36
= 676
√
Therefore, 42.56 is the population standard deviation for this grouped data.