Download Describing the Center of a Data Set with the arithmetic mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 4: CENTER: Mean, Median
VARIABILITY: Standard Deviation, Interquartile Range
Content Objective:
 SWBAT determine mean, median, deviation, standard deviation, and interquartile range
for a data set.
Language Objective:
 SWBAT explain why median and IQR are better measure of center and spread instead of
mean and standard deviation when the data is skewed or outliers are present.
The sample mean of a numerical sample, x1, x2, x3, . . . , xn, denoted x , is
x + x + x + ... + xn å xi
x= 1 2 3
=
n
n
The population mean is denoted by µ, is the average of all x values in the entire
population.
House Price
in Lowtown
x
97,000
93,000
110,000
121,000
113,000
95,000
100,000
122,000
99,000
2,000,000
å x = 2,950,000
xi 2,950,000
=
n
10
= 295,000
x=å
The “average” or mean price for this sample
of 10 houses in Lowtown is $295,000
Outlier
In the sample of 10 houses from Lowtown, the mean was affected very strongly by the one
house with the extremely high price.
The other 9 houses had selling prices around $100,000. This illustrates that the mean can
be very sensitive to a few extreme values.
The sample median is obtained by first ordering the n observations from smallest to
largest (with any repeated values included, so that every sample observation appears in the
ordered list). Then
Data sets and graphs from Peck, Olsen, Devore
Page 1 of 6
ìthe single middle value if n is odd
sample median= í
î the mean of the middle two values if n is even
We put the data in numerical increasing order to get
93,000
95,000
97,000
99,000
110,000
113,000
121,000
122,000
100,000
2,000,000
Since there are 10 (even) data values, the median is the mean of the two values in the
middle.
100,000 + 110,000
median = = $105,000
2
Comparing the Sample Mean & Sample Median:
The median splits the area in the distribution in half and the mean is the point of balance.
Typically,
 when a distribution is skewed positively, the mean is larger than the median,
 when a distribution is skewed negatively, the mean is smaller then the median, and
 when a distribution is symmetric, the mean and the median are equal.
RULES for finding median:
Find the count by using the formula: _____________________
Notice this works whether you have an odd or even number.
Data sets and graphs from Peck, Olsen, Devore
Page 2 of 6
The simplest numerical measure of the variability of a numerical data set is the range,
which is defined to be the difference between the largest and smallest data values.
range = maximum – minimum
The n deviations from the sample mean are the differences:
x1 - x, x2 - x, x3 - x, . . . , xn - x
The sum of all of the deviations from the sample mean will be equal to 0 (zero), except
possibly for the effects of rounding the numbers. This means that the average deviation
from the mean is always 0 (zero) and cannot be used as a measure of variability.
Ex. (Show this using post it notes)Time it took 9 student nurses to complete paperwork (in
minutes) (manipulate the times from all being 3’s to different configurations with 1
deviation away, 2 deviations, etc.
The sample variance, denoted s2 is the sum of the squared deviations from the mean
divided by n-1.
2
å (x - x )
2
s =
n -1
The sample standard deviation, denoted s is the positive square root of the sample
variance.
Data sets and graphs from Peck, Olsen, Devore
Page 3 of 6
å (x - x )
s= s =
n -1
2
2
The population standard deviation is denoted by s (sigma)and the population
variance is denoted by s 2 .
ex. Time it took 9 student nurses to complete paperwork (in minutes). Find the variance
and standard deviation of these times.
x- x
x
1
2
2
3
3
3
4
4
5
S=
S=
(x- x )2
S=
Another measure of Variability is INTERQUARTILE RANGE
10 Macintosh Apples were randomly selected and weighed (in ounces). Determine the
range, mean, variance, and standard deviation using the formulas.
x
7.52
8.48
7.36
6.24
7.68
6.56
6.40
8.16
7.68
8.16
74.24
x-x
0.096
1.056
-0.064
-1.184
0.256
-0.864
-1.024
0.736
0.256
0.736
0.000
(x - x)2
0.0092
1.1151
0.0041
1.4019
0.0655
0.7465
1.0486
0.5417
0.0655
0.5417
5.5398
Data sets and graphs from Peck, Olsen, Devore
Page 4 of 6
IQR (Inter-Quartile Range) is a ____________________measure of variability—it is
generally NOT affected by ______________________________________ in a data set.
Quartiles—divide data into 4 quarters
To find quartiles:
1.
arrange data in ascending order
2.
find median value________________________________
3.
divide data into lower and upper halves—excluding the median
4.
find median of the lower half__________________________________
5.
find median of the upper half__________________________________
1 4 5 7 9
1 4 5 7 9 10
median =
median =
Q1 =
Q1 =
Q3 =
Q3 =
Can use the calculator to find the median and quartiles using
1-Var Stats (scroll down to second page of results)
6.
subtract to find the interquartile range: IQR = Q3 – Q1
The IQR is the width of the ____________
____________of the data—it is not likely to
be overly dependent on extreme values or
outliers. Must always be _______________
or ______________. A ________________
(relative to the data values) represents a small
amount of variability; a large IQR represents
________________________________.
The IQR can be zero even if the data set has
some variability if _____________________
________________________________.
5-number summary:
N ( min – Q1 – med – Q3 – max)
Data sets and graphs from Peck, Olsen, Devore
Page 5 of 6
ex.
15 students with part time jobs were randomly selected and the number of hours
worked last week was recorded. Determine the median and IQR.
19, 12, 14, 10, 12, 10, 25, 9, 8, 4, 2, 10, 7, 11, 15
Data sets and graphs from Peck, Olsen, Devore
Page 6 of 6