Download Central Tendency & dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Objective
 To
understand measures of
central tendency and use
them to analyze data.
Measures of Central Tendency
Mean, Median and Mode
 Mean – the average


Sum of the data items
Total number of data items
Used to describe the middle of a set of data
that does not have outliers (data values that
are much higher or lower than other values in
the set.)
Find the Mean
Q: 4, 5, 8, 7
A: 6
Median: 6
Q: 4, 5, 8, 1000
A: 254.25
Median: 6.5
Median

The middle value in a set of data where
the numbers are arranged in order.

Used to describe the middle of a set of
data that does have outliers.
If the data has an even number of items
the median is the average of the middle
two numbers.

Median



Find the Median
4 5 6 6 7 8 9 10 12
Find the Median
5 6 6 7 8 9 10 12
Find the Median
5 6 6 7 8 9 10 100,000
Mode

The data item that occurs the most times

Can be used when data is not numeric

Can have one, two or more modes.

Used to choose the most popular
outcome.
Mode

Most Common Outcome
Male
Female
Measures of Central Tendency
Mode
The most common observation in a group of scores.

Flavor
f
30
Vanilla
28
25
e
d
ge
Fu
d
R
oc
k
y
R
ip
pl
Ro
a
n
ca
Pe
6
tte
r
Fudge Ripple
Bu
9
ol
ita
n
Rocky Road
ry
12
0
ea
p
Butter Pecan
5
N
8
be
r
Neapolitan
10
w
15
St
ra
Strawberry
15
ol
at
e
22
ho
c
Chocolate
20
C

If the data is categorical (measured on the nominal scale)
then only the mode can be calculated.
The most frequently occurring score (mode) is Vanilla.
Va
ni
lla

Distributions can be unimodal, bimodal, or multimodal.
f

Range

The difference between the least and
greatest data values.

Find the range of:
2, 34, 55, 22, 4, 7, 84, 55, 77

Summarizing Distributions
Two key characteristics of a frequency distribution
are especially important when summarizing data
or when making a prediction from one set of
results to another:
 Central Tendency




What is in the “Middle”?
What is most common?
What would we use to predict?
Dispersion


How Spread out is the distribution?
What Shape is it?
Measures of Variability
Central Tendency doesn’t tell us everything
Dispersion/Deviation/Spread tells us a lot
about how a variable is distributed.
We are most interested in Standard
Deviations (σ)
Standard Deviation
s


(X  X )
2
i
N
A measure of dispersion that describes
the typical difference or deviation between
the mean and a data value.
Subtract each data value from the mean
and square this difference. Do this for
each data value and add the answers
together. Divide the sum by the number of
data items then take the square root.
Find the standard deviation for
the following test scores

98, 72, 55, 88, 69, 92, 77, 89, 94, 70
Line Plot

Used to show frequency.
Tally Chart

Used to show frequency.
Stem and Leaf Plot

Used to organize data.

Easy to see the mode!
Practice!

Use the following data to make a stem and
leaf plot. Find the mean, median, mode
and range of the data.

18, 35, 28, 15, 36, 72, 14, 55, 62, 45, 80,
9, 72, 66, 28, 20, 51, 44, 28
Mean 40.95
Median 36
Mode 28
Range 71



1-4 Bar Graphs and Histograms
A histogram is a bar graph that shows the frequency
of data within equal intervals. There is no space
between the bars in a histogram.
Course 2
Histograms
The histogram is a tool for presenting the
distribution of a numerical variable in graphical
form.
For example, suppose the following data is the
number of hours worked in a week by a group of
nurses:
42
38
43
47
35
37
43
37
42
26
48
48
30
39
40
42
36
35
28
45
39
42
41
30
50
72
47
39
53
38
Histograms
These data are displayed in the following histogram:
12
10
The data values are grouped
in intervals of width five hours.
The first interval includes the
values from 25 to less than 30
hours. The second interval
includes values from 30 to
less than 35 and so on. The
intervals are shown on the
horizontal axis.
35
35
8
The vertical
axis is
frequency. So,
for example,
there are two
nurses who
worked from
25 to less than
30 hours that
week.
6
4
2
36
40
37
41
37
42
38
42
45
38
42
47
39
42
47
26
30
39
43
48
50
0 28
30
39
43
48
53
25
30
35
40
45
Hours worked in the week
50
72
55
60
65
70
75
Histograms
The choice of interval width
will affect the appearance
of the histogram.
12
10
8
6
4
6
20
2
5
0
4
25
30
35
40
45
50
55
60
65
70
75
Hours worked in the week
3
10
2
And
here
it is
to the
To the
right
is again,
the same
right,
presentedin
inaa
data presented
histogram of interval
width 2.
10.
1
0
260
30
25
34
38
35
42
46
45
50
Hours worked in the week
Hours worked in the week
54
58
55
62
66
65
70
74
75
1-4 Bar Graphs and Histograms
Additional Example 3: Making a Histogram
The table below shows the number of hours students
watch TV in one week. Make a histogram of the data.
Step 1: Make a frequency
table of the data. Be sure
to use equal intervals.
Number of
Hours of TV
Frequency
1–3
15
4–6
17
7–9
17
Course 2
Number of Hours of TV
1
2
3
4
5
//
////
//// ////
//// /
//// ///
6
7
8
9
///
//// ////
///
////
1-4 Bar Graphs and Histograms
Additional Example 3 Continued
Step 2: Choose an appropriate
scale and interval for the
vertical axis. The greatest value
on the scale should be at least
20
as great as the greatest
frequency.
16
Number of
Hours of TV
Frequency
1–3
15
8
4–6
17
4
7–9
17
0
Course 2
12
1-4 Bar Graphs and Histograms
Additional Example 3 Continued
Step 3: Draw a bar graph for
each interval. The height of the
bar is the frequency for that
interval. Bars must touch but
20
not overlap.
Number of
Hours of TV
Frequency
1–3
15
4–6
17
7–9
17
Course 2
16
12
8
4
0
1-4 Bar Graphs and Histograms
Additional Example 3 Continued
Step 4: Label the axes and give
the graph a title.
Hours of Television
Watched
20
Number of
Hours of TV
Frequency
1–3
15
4–6
17
7–9
17
16
12
8
4
0
1–3 4–6 7–9
Hours
Course 2
1-4 Bar Graphs and Histograms
Try This: Example 3
The table below shows the number of hats a group of
students own. Make a histogram of the data.
Step 1: Make a frequency
table of the data. Be sure
to use equal intervals.
Number of
Hats Owned
Frequency
1–3
12
4–6
18
7–9
24
Course 2
Number of
Hats Owned
1
2
3
4
5
6
7
8
9
Frequency
//
////
////
////
////
////
////
////
////
/
/
///
/
////
////
1-4 Bar Graphs and Histograms
Try This: Example 3
Step 2: Choose an appropriate
scale and interval for the
30
vertical axis. The greatest
value on the scale should be at 25
least as great as the greatest
20
frequency.
Number of
Hats Owned
Frequency
1–3
12
4–6
18
7–9
24
Course 2
15
10
5
0
1-4 Bar Graphs and Histograms
Try This: Example 3
Step 3: Draw a bar graph for
each interval. The height of the 30
bar is the frequency for that
25
interval. Bars must touch but
not overlap.
20
Number of
Hats Owned
Frequency
1–3
12
4–6
18
7–9
24
Course 2
15
10
5
0
1-4 Bar Graphs and Histograms
Try This: Example 3
Number of Hats Owned
Step 4: Label the axes and
give the graph a title.
30
25
20
Number of
Hats Owned
Frequency
1–3
12
4–6
18
7–9
24
15
10
5
0
1–3 4–6 7–9
Number of Hats
Course 2
1-4 Bar Graphs and Histograms
Lesson Quiz: Part 1
1. The list shows the number of laps students
ran one day. Make a histogram of the data.
4, 7, 9, 12, 3, 6, 10, 15, 12, 5, 18, 2, 5, 10, 7,
12, 11, 15
8
6
4
0
0–4
2
Number of Laps
Course 2
Normally Distributed Curve
Characteristics of the Normal
Distribution
It is symmetrical -- Half the cases are to one side of
the center; the other half is on the other side.
The distribution is single peaked
Most of the cases will fall in the center portion of the
curve and as values of the variable become more
extreme they become less frequent, with “outliers”
at each of the “tails” of the distribution few in
number.
The Mean, Median, and Mode are the same.
Percentage of cases in any range of the curve can be
calculated.
68-95-99.7 Rule
68% of
the data
95% of the data
99.7% of the data
Skewed Distributions
The total area under the normal curve is 1.
About 68% of the area lies within 1 standard deviation of the mean.
About 95% of the area lies within 2 standard deviations of the mean.
About 99.7 % of the area lies within 3 standard deviations of the mean.
Why can’t the mean tell us everything?




Mean describes Central Tendency, what the
average outcome is.
We also want to know something about how
accurate the mean is when making predictions.
The question becomes how good a
representation of the distribution is the mean?
How good is the mean as a description of
central tendency -- or how good is the mean as
a predictor?
Answer -- it depends on the shape of the
distribution. Is the distribution normal or
skewed?
Dispersion
Once you determine that the variable of interest
is normally distributed, the next question to
be asked is about its dispersion: how spread
out are the scores around the mean.
Dispersion is a key concept in statistical
thinking.
How much do the scores deviate around the
Mean? The more “bunched up” around the
mean the better your ability to make accurate
predictions.
How well does the mean represent the scores
in a distribution? The logic here is to
determine how much spread is in the
scores. How much do the scores "deviate"
from the mean? Think of the mean as the
true score or as your best guess. If every X
were very close to the Mean, the mean would
be a very good predictor.
If the distribution is very sharply peaked then
the mean is a good measure of central
tendency and if you were to use the mean to
make predictions you would be right or close
much of the time.