Download Measures of Centrality and Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Transcript

Methods to take large amounts of data
and present it in a concise form
› Want to present height of females and males
in STA 220
› Could measure everyone and graph results
› More interested in
that
describes the most likely representation of
the height of the students in the class
 This is called
2

Once you have your measure of
centrality may want or need to know

Is the data repeatable?
› This would be
3

3 common measures of centrality
›
›
›
4

Mean
› Mathematical average of all the data
Mean 
Mean 
sample size
 data
sample size
5

Example
› Suppose Suzy is taking Chemistry. There is a
lab quiz every other week. Near the end of
the semester, Suzy wants to determine her
quiz average. Her quiz scores are: 78, 92, 83,
95, 98, 87 and 93.
78  92  83  95  
 93
Mean 
7
626
Mean 
7
Mean 
6

Mathematical shorthand:
› Data points are often referred to as xi where i
is 1…n, n being
› For Suzy’s quiz scores, n = 7 and x1 = 78, x2 =
92, x3 = 83, x4 = 95, x5 = 98, x6 = 87, and x7 =
93.
› The mean would be denoted by
, called
x-bar.
 For Suzy’s quizzes, x  89.43
7

The median is the
of the
dataset, such that half of all data points
are
to that value
AND half of all data points are
to that value.
8

To find the median:
1. Rearrange data from smallest to largest
2. If n is odd, calculate
3. If n is even, calculate
4. Count the sorted data set until you get to the
data point in the position you calculated in
part 2 or 3
5. If the number of data points, n, was odd, then
you are done. If n is even, then compute the
mean of the data point in the
position and
position.
9

Example
› Given the following salary information from a
group of engineers, determine the median
salary: $75,400; $83,600; $45,700; $43,900;
$62,100; $90,500; $55,800.
› First reorder the data in increasing order:
 43,900; 45,700; 55,800; 62,100; 75,400; 83,600; 90,500
› Since n = 7 is odd, compute
 = (7+1)/2 = 4
 43,900; 45,700; 55,800;
; 75,400; 83,600; 90,500
10

Example
› A group of students are taking the following number
of credit hours: 12, 17, 15, 14, 9, 16, 18, 16, 14, 12.
Find the median number of credit hours being taken
by this group of students.
› Put the data in increasing order:
 9, 12, 12, 14, 14, 15, 16, 16, 17, 18
› Since n = 10 is even, compute

= 10/2 = 5
› Next, identify the data points in the fifth and sixth
position
 9, 12, 12, 14, 14, 15, 16, 16, 17, 18
› Compute the mean of the fifth and sixth data points

=14.5
11
The mode is the number that appears
the most often in the data set.
 Example: Here are the number of
cavities found in a class of 1st graders:

› 0,1,0,1,0,5,5,3,4,0,0,2,0,1,0,3,2,4,7,1. Find the
mode.
› 0 occurs times, while 1 occurs
times, 2,
3, 4, and 5 occurs
, and 7 occurs once.
As 0 occurs the most often, it is
.
12

Comparing Mean, Median, Mode
› Mean
 Strong Points

 Uses all of the data

 Weak Points
 Sensitive to extremes. Test scores: 34, 92, 95, 94, 89 have
a mean of 80.8. If the professor dropped the lowest test
score, 34, then the mean would be
 May not be an actual, observable value. For example,
the average family has 1.6 children. What does it mean
to have 0.6 of a child?
13

Comparing Mean, Median and Mode
› Median and Mode
 Strong Points
 Not sensitive to
. In test
score example from before the median would be 34, 89,
92, 94, 95.
 The mode is
an observable value; the
median is
an observable value
 Weak Points
 The value may not be unique. In the case of the mode,
it is possible to have several values that appear the most.
 Both do not use actual/all data values. The mode keys
in on frequency, while the median just looks at the
middle of the data set.
14

In 1995, the mean salary of a MLB player
was $1,080,000 while the median salary
of a MLB player was $275,000.
› Recall the median is the point where half of
the data points are above and half are
below – Thus, at least half of the players in
the MLB earned less than
› A mean of $1,080,000 tells you that there are
players earning millions of dollars – but this
may not be the
number of all players in the MLB
15

The Corps of Engineers wants to dredge a
harbor in Hackensack, NJ. The EPA has
these guidelines for harbor dredging:
› The sediment is tested for the presence of PCBs.
› If PCBs < 25 parts per billion, then its OK to
dredge and dump.
› If 25 ppb ≤ PCBs ≤ 50 ppb, then its OK to dredge
and dump, but then a cap must be placed on
the dump pile.
› If PCBs ≥ 50 ppb, then the harbor can not be
dredged and dumped.
16
6 samples are taken, and the average
PCBs was 46.5 ppb. The Corps of
Engineers should be allowed to dredge
and dump the harbor, then cap the
dump site…or should they?
 The actual samples were: 66, 74, 81, 55,
1, 2.

› The average is
› The median is
17
Measures of variability describe the
of the data
 All measures of variability are greater
than or equal to

› Measures close to
indicate that the
data is highly consistent and repeatable

4 measures of variability:
Average deviation,
Standard Deviation
,
,
18

Range
› Difference between the largest data point in
the dataset and the smallest data point in
the dataset
› or Range =

Example
› Suppose the daily low temperatures for the
past week have been -3, -7, -2, 0, 2, 4. What
is the range?
› Range =
= 11
19

Average Deviation
› The average deviation of the data from its mean
value.
› There are 4 steps:
1. Compute the
of the data set, x-bar
2. Calculate the absolute value of the
between each data point, xi , and the mean
value, x-bar
3. Add up all of the values calculated in step 2
4. Divide the sum from step 3 by
20

Average Deviation, Example
› Suppose you have the following four data
points in your dataset: 1,2,4,5. Find the
average deviation.
1 2  4  5
1. x 
3
4
2. 1 - 3  2; 2 - 3  1; 4 - 3  1; 5 - 3  2
3.
6
4.  1.5
4
21

Average Deviation
› In mathematical shorthand, the average
deviation can be expressed as:
AverageDeviation 
› Good method is to make a table:
|Xi – (x-bar)|
Result
1
|1-3|
2
2
|2-3|
1
4
|4-3|
1
5
|5-3|
2
12/4 = 3
6/4 = 1.5
22

Variance
› Similar to average deviation
1. Compute the mean of the dataset, x-bar
2. Calculate the difference between each
data point, xi , and the mean value, x-bar
3.
all of the values in step 2
4. Add up all the values in step 3
5. Divide the sum in step 4 by the total
number of data points
23

Variance, Example
› Good idea to make a table similar to the
one we used for average deviation
Xi
Xi – (x-bar)
Xi – (x-bar)
1
1-3
-2
4
2
2-3
-1
1
4
4-3
1
1
5
5-3
2
4
12/4 = 3
24

Variance
› Mathematical shorthand:
Variance 
25

Standard Deviation
› The standard deviation is just the
› By taking the square root, the units of the
standard deviation are the same as the
original units of the data
› In the previous example:
Standard Deviation  Variance
Standard Deviation 
Standard Deviation  1.58 inches
26