Download Measures of Centrality and Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript

Methods to take large amounts of data
and present it in a concise form
› Want to present height of females and males
in STA 220
› Could measure everyone and graph results
› More interested in
that
describes the most likely representation of
the height of the students in the class
 This is called
2

Once you have your measure of
centrality may want or need to know

Is the data repeatable?
› This would be
3

3 common measures of centrality
›
›
›
4

Mean
› Mathematical average of all the data
Mean 
Mean 
sample size
 data
sample size
5

Example
› Suppose Suzy is taking Chemistry. There is a
lab quiz every other week. Near the end of
the semester, Suzy wants to determine her
quiz average. Her quiz scores are: 78, 92, 83,
95, 98, 87 and 93.
78  92  83  95  
 93
Mean 
7
626
Mean 
7
Mean 
6

Mathematical shorthand:
› Data points are often referred to as xi where i
is 1…n, n being
› For Suzy’s quiz scores, n = 7 and x1 = 78, x2 =
92, x3 = 83, x4 = 95, x5 = 98, x6 = 87, and x7 =
93.
› The mean would be denoted by
, called
x-bar.
 For Suzy’s quizzes, x  89.43
7

The median is the
of the
dataset, such that half of all data points
are
to that value
AND half of all data points are
to that value.
8

To find the median:
1. Rearrange data from smallest to largest
2. If n is odd, calculate
3. If n is even, calculate
4. Count the sorted data set until you get to the
data point in the position you calculated in
part 2 or 3
5. If the number of data points, n, was odd, then
you are done. If n is even, then compute the
mean of the data point in the
position and
position.
9

Example
› Given the following salary information from a
group of engineers, determine the median
salary: $75,400; $83,600; $45,700; $43,900;
$62,100; $90,500; $55,800.
› First reorder the data in increasing order:
 43,900; 45,700; 55,800; 62,100; 75,400; 83,600; 90,500
› Since n = 7 is odd, compute
 = (7+1)/2 = 4
 43,900; 45,700; 55,800;
; 75,400; 83,600; 90,500
10

Example
› A group of students are taking the following number
of credit hours: 12, 17, 15, 14, 9, 16, 18, 16, 14, 12.
Find the median number of credit hours being taken
by this group of students.
› Put the data in increasing order:
 9, 12, 12, 14, 14, 15, 16, 16, 17, 18
› Since n = 10 is even, compute

= 10/2 = 5
› Next, identify the data points in the fifth and sixth
position
 9, 12, 12, 14, 14, 15, 16, 16, 17, 18
› Compute the mean of the fifth and sixth data points

=14.5
11
The mode is the number that appears
the most often in the data set.
 Example: Here are the number of
cavities found in a class of 1st graders:

› 0,1,0,1,0,5,5,3,4,0,0,2,0,1,0,3,2,4,7,1. Find the
mode.
› 0 occurs times, while 1 occurs
times, 2,
3, 4, and 5 occurs
, and 7 occurs once.
As 0 occurs the most often, it is
.
12

Comparing Mean, Median, Mode
› Mean
 Strong Points

 Uses all of the data

 Weak Points
 Sensitive to extremes. Test scores: 34, 92, 95, 94, 89 have
a mean of 80.8. If the professor dropped the lowest test
score, 34, then the mean would be
 May not be an actual, observable value. For example,
the average family has 1.6 children. What does it mean
to have 0.6 of a child?
13

Comparing Mean, Median and Mode
› Median and Mode
 Strong Points
 Not sensitive to
. In test
score example from before the median would be 34, 89,
92, 94, 95.
 The mode is
an observable value; the
median is
an observable value
 Weak Points
 The value may not be unique. In the case of the mode,
it is possible to have several values that appear the most.
 Both do not use actual/all data values. The mode keys
in on frequency, while the median just looks at the
middle of the data set.
14

In 1995, the mean salary of a MLB player
was $1,080,000 while the median salary
of a MLB player was $275,000.
› Recall the median is the point where half of
the data points are above and half are
below – Thus, at least half of the players in
the MLB earned less than
› A mean of $1,080,000 tells you that there are
players earning millions of dollars – but this
may not be the
number of all players in the MLB
15

The Corps of Engineers wants to dredge a
harbor in Hackensack, NJ. The EPA has
these guidelines for harbor dredging:
› The sediment is tested for the presence of PCBs.
› If PCBs < 25 parts per billion, then its OK to
dredge and dump.
› If 25 ppb ≤ PCBs ≤ 50 ppb, then its OK to dredge
and dump, but then a cap must be placed on
the dump pile.
› If PCBs ≥ 50 ppb, then the harbor can not be
dredged and dumped.
16
6 samples are taken, and the average
PCBs was 46.5 ppb. The Corps of
Engineers should be allowed to dredge
and dump the harbor, then cap the
dump site…or should they?
 The actual samples were: 66, 74, 81, 55,
1, 2.

› The average is
› The median is
17
Measures of variability describe the
of the data
 All measures of variability are greater
than or equal to

› Measures close to
indicate that the
data is highly consistent and repeatable

4 measures of variability:
Average deviation,
Standard Deviation
,
,
18

Range
› Difference between the largest data point in
the dataset and the smallest data point in
the dataset
› or Range =

Example
› Suppose the daily low temperatures for the
past week have been -3, -7, -2, 0, 2, 4. What
is the range?
› Range =
= 11
19

Average Deviation
› The average deviation of the data from its mean
value.
› There are 4 steps:
1. Compute the
of the data set, x-bar
2. Calculate the absolute value of the
between each data point, xi , and the mean
value, x-bar
3. Add up all of the values calculated in step 2
4. Divide the sum from step 3 by
20

Average Deviation, Example
› Suppose you have the following four data
points in your dataset: 1,2,4,5. Find the
average deviation.
1 2  4  5
1. x 
3
4
2. 1 - 3  2; 2 - 3  1; 4 - 3  1; 5 - 3  2
3.
6
4.  1.5
4
21

Average Deviation
› In mathematical shorthand, the average
deviation can be expressed as:
AverageDeviation 
› Good method is to make a table:
|Xi – (x-bar)|
Result
1
|1-3|
2
2
|2-3|
1
4
|4-3|
1
5
|5-3|
2
12/4 = 3
6/4 = 1.5
22

Variance
› Similar to average deviation
1. Compute the mean of the dataset, x-bar
2. Calculate the difference between each
data point, xi , and the mean value, x-bar
3.
all of the values in step 2
4. Add up all the values in step 3
5. Divide the sum in step 4 by the total
number of data points
23

Variance, Example
› Good idea to make a table similar to the
one we used for average deviation
Xi
Xi – (x-bar)
Xi – (x-bar)
1
1-3
-2
4
2
2-3
-1
1
4
4-3
1
1
5
5-3
2
4
12/4 = 3
24

Variance
› Mathematical shorthand:
Variance 
25

Standard Deviation
› The standard deviation is just the
› By taking the square root, the units of the
standard deviation are the same as the
original units of the data
› In the previous example:
Standard Deviation  Variance
Standard Deviation 
Standard Deviation  1.58 inches
26