Download Chapter 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 2
Descriptive Statistics
I. Section 2-1
A. Steps to Constructing Frequency Distributions
1. Determine number of classes (may be given to you)
a. Should be between 5 and 15 classes.
2. Find the Range
a. The Maximum minus the Minimum.
1) Use the TI-84 to sort the data.
a) STAT – Edit – enter numbers into L1
b) STAT – SortA(L1) will put the numbers into ascending order.
c) STAT – SortD(L1) will put the numbers into descending order.
3. Find the Class Width
a. Range divided by the number of classes.
1) Always round UP!!
a) Even if class width comes out to a whole number, go up one.
4. Find the Lower Limits
a. Begin with the minimum value in your data set, and then add the
class width to that to get the next Lower Limit.
1) Repeat as many times as needed to get the required number of
classes.
5. Find the Upper Limits
a. The Upper Limit of the first class is one less than the Lower Limit of
the second class.
1) Add the class width to each Upper Limit until you have the
necessary number of classes.
6. Find the Lower Boundaries
a. Subtract one-half unit from each Lower Limit (Do NOT round these!)
7. Find the Upper Boundaries
a. Add one-half unit to each Upper Limit.
8. Find the Midpoints of each class.
a. The means of the Lower and Upper Limits (Do NOT round).
1) Could also use the means of the boundaries for this.
9. Frequency Distribution
a. Place a tally mark in each class for every piece of data that fits
there.
b. Add up the tally marks – these are your frequencies for each class.
10. Relative Frequencies
a. Divide the class frequencies by the total number of data points to
find the percentage of the total represented by each class.
11. Cumulative Frequencies
a. The total number of tallies for each class, plus all those that came
before.
1) The cumulative frequency of the last class must equal the
number of data points used.
Frequency Distribution Example
(Separate Hand-Out from Notes Outline)
EXAMPLE: Use the table of 30 numbers below to fill in a frequency distribution of
6 classes.
STAT – Edit – Enter
these 30 numbers into
L1 on your calculator
STAT – SortA(L1) – 2nd 1
enters L1 into the
parentheses.
STAT – Edit – to see the
new list, in order.
72
84
61
76
104
76
86
92
80
88
98
76
97
82
84
67
70
81
82
89
74
73
86
81
85
78
82
80
91
83
61
67
70
72
73
74
76
76
76
78
80
80
81
81
82
82
82
83
84
84
85
86
86
88
89
91
92
97
98
104
EXAMPLE: Use the table of 30 numbers below to fill in a frequency distribution of
6 classes.
Max Value:
104
Min Value:
61
Range: 104 - 61= 43
Class Width: 43/6 = 7.2
– Round UP to 8!
Minimum Value is the
First Lower Limit
Add Class Width
Down
Check to be sure
that the Maximum
Value fits in the
last class.
61
67
70
72
73
74
76
76
76
78
80
80
81
81
82
82
82
83
84
84
85
86
86
88
89
91
92
97
98
104
LL
UL
61
68
69
76
77
84
85
92
93
100
101
108
LB
UB
MdPt
Freq.
Rel.
Freq.
Cum.
Freq.
First Upper Limit is one less than 2nd Lower Limit
Add Class Width Down
Since 104 fits between 101 and 108, we are good.
If the Maximum value does NOT fit into the last class, you did
something wrong. DO IT AGAIN!!
EXAMPLE: Use the table of 30 numbers below to fill in a frequency distribution of
6 classes.
Max Value:
Min Value:
Range:
Class Width:
104
61
43
8
61
67
70
72
73
74
76
76
76
78
80
80
81
81
82
82
82
83
84
84
85
86
86
88
89
91
92
97
98
104
LL
UL
LB
UB
MdPt
Subtract one-half unit
from lower limits to
get lower boundaries
61
68
60.5
68.5
69
76
68.5
76.5
64.5
72.5
77
84
76.5
84.5
80.5
Add one-half unit to
upper limits to get
upper boundaries
85
92
92.5
88.5
93
101
100
84.5
92.5
100.5
108
100.5
108.5
96.5
104.5
Freq.
Rel.
Freq.
Cum.
Freq.
Find the mean
of the limits (or
boundaries) to
find the
midpoint of
each class.
EXAMPLE: Use the table of 30 numbers below to fill in a frequency distribution of
6 classes.
Max Value:
Min Value:
Range:
Class Width:
104
61
43
8
Count how many data
points fit in each class
and enter that into
the Frequency column
61
67
70
72
73
74
76
76
76
78
80
80
81
81
82
82
82
83
84
84
85
86
86
88
89
91
92
97
98
104
LL
UL
LB
UB
MdPt
Freq.
61
68
60.5
68.5
64.5
2
69
76
68.5
76.5
72.5
7
77
84
76.5
84.5
80.5
11
85
92
84.5
92.5
88.5
7
93
100
92.5
100.5
96.5
2
101
108
100.5
108.5
104.5
1
Rel.
Freq.
Cum.
Freq.
EXAMPLE: Use the table of 30 numbers below to fill in a frequency distribution of
6 classes.
Max Value:
Min Value:
Range:
Class Width:
104
61
43
8
Relative Frequency is
the class frequency
divided by the total
frequency.
In this case, we have
30 pieces of data, so
we divide by 30.
61
67
70
72
73
74
76
76
76
78
80
80
81
81
82
82
82
83
84
84
85
86
86
88
89
91
92
97
98
104
LL
UL
LB
UB
MdPt
Freq.
61
68
60.5
68.5
64.5
2
Rel.
Freq.
0.07
2/30
69
76
68.5
76.5
72.5
7
0.23
7/30
77
84
76.5
84.5
80.5
11
11/30
0.37
85
92
84.5
92.5
88.5
7
93
100
92.5
100.5
96.5
2
7/30
0.23
0.07
2/30
101
108
100.5
108.5
104.5
1
1/30
0.03
1.00
Relative Frequency column
MUST add up to 1!!
If it does NOT, you are
wrong. DO IT AGAIN!!
Cum.
Freq.
EXAMPLE: Use the table of 30 numbers below to fill in a frequency distribution of
6 classes.
Max Value:
Min Value:
Range:
Class Width:
104
61
43
8
Cumulative Frequency
is the Frequency of
each class, plus the
classes that came
before it. The last
class must have a
cumulative frequency
that matches the
number of data points
61
67
70
72
73
74
76
76
76
78
80
80
81
81
82
82
82
83
84
84
85
86
86
88
89
91
92
97
98
104
LL
UL
LB
UB
MdPt
Freq.
Rel.
Freq.
Cum.
Freq.
61
68
60.5
68.5
64.5
2
0.07
2
69
76
68.5
76.5
72.5
7
0.23
9
77
84
76.5
84.5
80.5
11
0.37
20
85
92
84.5
92.5
88.5
7
0.23
27
93
100
92.5
100.5
96.5
2
0.07
29
101
108
100.5
108.5
104.5
1
0.03
30
B. Steps to Constructing a Frequency Histogram
1. Label the horizontal axis with the class boundaries.
2. Label the vertical axis with the number of frequencies.
3. Draw a bar graph with bars that touch, using the frequencies from your
frequency distribution.
C. Steps to Constructing a Relative Frequency Histogram
1. Label the horizontal axis with the class boundaries.
2. Label the vertical axis with the frequency percentages.
3. Draw a bar graph with bars that touch, using the relative frequencies
from your frequency distribution.
D. Steps to Constructing an Ogive
1. Label the horizontal axis with the midpoints of each class.
2. Label the vertical axis with the total number of data points.
3. Place a dot at each midpoint that corresponds to that class’s
cumulative frequency.
a. This chart will always end at the total number of data points.
Assignments:
Classwork: Pages 49-51 #1-25 Odds
Homework: Pages 51-54 #28-42 Evens
II. Section 2-2
A. Stem and Leaf Plot
1. Use the extreme values as your starting point.
2. Go through the data points, placing the leaves beside the appropriate
stems.
3. If you have too many data points, you can use two lines per stem, with
0-4 consisting of the first line, and 5-9 on the second line.
EXAMPLE: Use the table of 30 numbers below to fill in a stem and leaf plot
72
84
61
76
104
76
86
92
80
88
98
76
97
82
84
67
70
81
82
89
74
73
86
81
85
78
82
80
91
83
Stem
Leaves
10
9
8
4
7
2
1
6
6
Continue in this manner until all 30 data points are
represented in the stem and leaf plot.
B. Dot Plot
1. Use a horizontal line, numbered from lowest data value to highest.
a. Place a dot on the line at each data point.
1) This allows you to see visually whether you have a tight grouping
of data points, and where it is, if it exists.
C. Pie Chart
1. Used to describe parts of a whole.
a. Multiply the relative frequency you calculated earlier by 360 (the
number of degrees in a circle) to find the number of degrees that
each class will consist of.
1) The calculated number of degrees corresponds to the interior
angle in the circle.
a) Use a protractor to draw your angles.
D. Scatter Plot
1. Used to visually examine the possible relationship between two
different elements.
a. Place one element on the vertical axis, and the other on the
horizontal.
1) Graph them as if one was the x value of an ordered pair and the
other was the y-value.
2. The closer the dots are to being linear, the stronger the relationship.
a. If the slope is upward, the relationship has a positive correlation.
b. If the slope is downward, the relationship has a negative
correlation.
D. Scatter Plot
3. To do a Scatter Plot on the TI-84, follow these steps.
A. Turn STAT Plots on
1) 2nd y=, Enter
2) Highlight Plot On, Enter
B) Go to STAT and Edit
1) Enter x-values into L1, and y-values into L2.
C) Press the Window button, and set your x-min and x-max values to
match the data in L1.
1) Repeat for y-min and y-max values to match L2.
D) Press graph to see the scatterplot.
4. To get the equation of the line of best fit, go to STAT and Calc, then
select LinReg (4).
A) The slope and y-intercept will be given to you.
5. To graph the line with the scatterplot, manually enter the equation into
the y= window and press Graph.
Assignments:
Classwork: Page 62 #1-12
Homework: Pages 63-66 #18, 22, 27, 28, 33, 36
Quiz on Lessons 2-1 and 2-2 on Friday!!
III. Section 2-3
A. Measures of Central Tendency
1. Mean – The sum of all data points divided by the number of values.
a. This one is the one that we most often think of when we say
“average”.
1) It’s also the one most affected by an extreme value (either high
or low).
2. Median – the middle number (or mean of two middle numbers) when
the data points are put into order.
a. The point which has as many data values above it as there are
below it.
3. Mode – The value that happens the most often (highest frequency).
B. Shapes of Distributions
1. Symmetric – Data bunched in the middle, with equal distribution on
either side.
2. Uniform – Data is spread evenly across the whole spectrum.
3. Skewed Data – Named by the “tail”.
a. Skewed right means most of the data values are to the left (low)
end of the range.
b. Skewed left means that most of the data values are to the right
(high) end of the range.
IV. Section 2-4
A. Measures of Variation
1. Range – the difference between the highest value and the lowest
value. (Maximum minus Minimum)
a. Easy to compute but only uses two numbers from a data set.
2. Deviation – The difference between the value of a data point and the
mean of the data set.
a. In a population, the deviation of x is 𝑥 − 𝜇. (Greek letter “mu”,
pronounced “moo”)
b. In a sample, the deviation of x is 𝑥 − 𝑥 (pronounced “x bar”)
c.The sum of the deviations of a set of data will always be zero.
3. Population Measures of Variance –
a. Population Variance -- The sum of the squares of the deviations,
divided by N (the number of data points in the population).
1). Find the deviations, and then square them (this makes them all
positive, so they don’t cancel each other out)
a) Add up the squared deviations, and then divide by the
number of data points.
b. Population Standard Deviation – The square root of the population
variance.
4. Sample Measures of Variance
a. Sample Variance – The sum of the squares of the deviations,
divided by n - 1 (one less than the number of data points in the
sample).
b. Sample Standard Deviation – The square root of the sample
variance.
B. Empirical Rule
1. All symmetric bell-shaped distributions have the following
characteristics:
a. About 68% of data points will occur within one standard deviation
of the mean.
b. About 95% of data points will occur within two standard deviations
of the mean.
c. About 99.7% of data points will occur within three standard
deviations of the mean.
C. Chebychev’s Theorem
1. This applies to ANY distribution, regardless of its shape.
a. The portion of data lying with k standard deviations (k > 1) of the
1
mean is at least 1 − 2
𝑘
1) For k = 2, at least 1 – ¼ = ¾ or 75% of the data will be within 2
standard deviations of the mean.
2) For k = 3, at least 1 – 1/9 = 8/9 or 88.9% of the data will be
within 3 standard deviations of the mean.
V. Section 2-5 – Measures of Position
A. Quartiles
1. Q1, Q2 and Q3 divide the data into 4 equal parts.
a. Q2 is the same as the median, or the middle value.
b. Q1 is the median of the data below Q2.
c.Q3 is the median of the data above Q2.
2. Box and Whisker Plot
a. Left whisker runs from lowest data value to Q1.
b. Box runs from Q1 to Q3, with a line through it at Q2.
1) The distance from Q1 to Q3 is called the interquartile range.
c. Right whisker runs from Q3 to highest data value.
d. To draw a box-and-whisker plot on the TI-84, follow these steps.
1) Enter the data values into L1 in STAT Edit
2) Turn on your Stat Plots (2nd Y=), and select the plot with the boxand-whisker shown
3) Set your window to match the data
a) Xmin should be less than your lowest data point.
b) Xmax should be more than your highest data point.
4) Press graph. The box-and-whisker plot should appear.
a) Press the Trace button and you can see exactly which values
make up the Min, Q1, Median, Q3, and the Max.
B. Percentiles
1. Divide the data into 100 parts. There are 99 percentiles (P1, P2, P3, …P99)
a. P50 = Q2 = the median.
b. P25 = Q1
c. P75 = Q3
2. A 63rd percentile score means that this person did as well as or better
than 63% of the people who took that test.
3. The cumulative frequency that we did way back in section one can help
us find the percentile.
C. Z-Scores
1. Also called the “standard score”, it represents the number of standard
deviations that a data value is away from the mean.
value−mean
𝑥−𝜇
a. 𝑧 =
=
standard deviation
𝜎
2. A z-score of less than -2 or greater than 2 is considered to be unusual.
a. Remember that 95% of data points should be within 2 standard
deviations of the mean (if the data is symmetrically distributed).