Download PP Section 1.3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
AP Statistics: 5E Section 1.3
In Section 1.2 we discussed the shape of a distribution
and briefly touched on outliers. In Section 1.3 we
discuss outliers in more detail as well as measures of
center and spread.
The two most common measures of center are the
mean and the median.
To find the mean of a set of data values,
find the sum of the values and divide by the
number of observations
n will always be used to represent
the number of data values
x1  x2      xn
xi
mean =

n
n
x
NOTATION: Mean of a sample: _______

Mean of a population: _______
The median of a set of data values,
is a number such that roughly half the values are
smaller and roughly half are larger
To find the median:
1. Arrange the values from smallest to largest
2. If n is odd, the median is the center value
3. If n is even, the median is the mean of the
two center values
EXAMPLE: It is commonly believed that “normal” human body
temperature is 98.6°F (or 37°C). In fact, “normal” temperature can
vary from person to person, and for a given person it can vary over
the course of a day. The table below gives a set of temperature
readings of a healthy woman taken over a two-day period.
Find the mean and the median.
97.911
median: __________
97.7
mean:_________
The lowest recorded temperature was 97.2. Assume this
temperature was changed to 96.8, and recompute the
mean:_________
97.7
97.867 and the median: __________
A statistic is resistant if
it is relatively unaffected by outliers
In a roughly symmetric distribution,
the mean and median are approximately equal
In a right-skewed distribution,
the mean is larger than the median
In a left-skewed distribution.
the mean is smaller than the median
The most common measures of spread, or
variability, are the range, IQR and standard
deviation.
The range of a set of data is simply the
difference between the largest and smallest
values.
Before we can define the IQR we need to discuss the 1st
and 3rd quartiles (Q1 and Q3).
Q1 is the median of the values that are to the left of
the median in the ordered list
Q3 is the median of the values that are to the right of
the median in the ordered list
Note: The quartiles (including Q2, the median) divide a
set of data into roughly 4 equal parts with 25% in each
part.
Q3  Q1
The IQR, or interquartile range, is _________.
Like the median, Q1 and Q3 are resistant to
outliers. Thus the IQR is also resistant.
The IQR Rule for Outliers: An observation in a set of data
is an outlier if it is
less than Q1  1.5 IQR
____________________
OR
greater than Q 3  1.5 IQR
_______________________
Example: The data at the right shows the amount of fat (in grams) for
McDonald’s beef sandwiches. Determine whether there are any
outliers in this data.
IQR  29  21  8
21  1.5(8)  9
29  1.5(8)  41
Since 43  41, it is an outlier
The five-number summary consists of
Minimum
Q1
Median
Q3
Maximum
A boxplot, or box-and-whisker plot, is a graph for displaying the fivenumber summary.
Procedure for constructing a boxplot:
1. Construct a horizontal scale that includes the minimum and
maximum values.
2. Construct a rectangle that extends from Q1 to Q3 and draw a line in
the rectangle at the median.
3. Draw lines extending out from the rectangle to the most extreme
values that are not outliers.
4. Identify each outlier individually with a symbol such as an asterisk
Example: Barry Bonds set the major league record for homeruns in a
season when he hit 73 HRs in 2001. Here are the data on the number
of HRs Bonds hit in each of his 21 complete seasons.
16, 25, 24, 19, 33, 25, 34, 46, 37, 33, 42, 40,
37, 34, 49, 73, 46, 45, 45, 26, 28
Construct a boxplot for these values.
Min  16 Q1  25.5 Med  34 Q 3  45 Max  73
IQR  45 - 25.5  19.5
25.5 - 1.5(19.5)  -3.75
45  1.5(19.5)  74.25
No Outliers
TI-83/84: Put data in L1. Press STATPLOT (2nd function of
Y=) and press ENTER. Select ON and arrow down.
Select the boxplot type that is first in the 2nd row by
using the right arrow, press ENTER, and arrow down. Be
sure the Xlist is where you have your data and the freq.
is 1. Press GRAPH. Press ZOOM, select option 9 for
ZoomStat and press ENTER. The boxplot should be
displayed. If you press the TRACE key and use the
arrows, the calculator will give you the values in the 5number summary and any outliers.
The final measure of spread we need to consider
is standard deviation.
A deviation is
the difference between a data value and the mean
Note: The sum of the deviations should be zero.
The standard deviation of a set of data values is
the typical distance between a data value and
the mean
 xi  x 
Standard deviation =
n 1
2
NOTATION:
Standard deviation of a sample: _______
s

Standard deviation of a population: _______
•
The standard deviation will have the
same unit of measure as the data
•
The standard deviation  0, and equals
all the data vales are equal
zero when ________________________
(i.e. no variability)
The variance of a set of data is equal to
the square of the standard deviation
NOTATION:
2
Variance of a sample: _______
s
Variance of a population: _______

2
TI83/84: Standard deviation is found in the
same way as the mean or five-number
summary.
Always use Sx!
Example: Find the standard deviation for the
data on Bond’s HRs.
  12.687
Example: Rank the measures of spread
(range, IQR or standard deviation) as to their
resistant to outliers from least to most
resistant.
Range
Standard deviation
IQR
Since the standard deviation measures spread
about the mean it should always be used as
the measure of spread when the mean is
used as the measure of center. In the same
way, the IQR should be used as the measure
of spread when the median is used as the
measure of center.
Because of their resistance to outliers the
median and the IQR are usually better when
describing a skewed distribution or a
distribution with outliers. The mean and the
standard deviation are best used when the
distribution is roughly symmetric.