Download 1-2 Day 4

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

LESSON 1 – 2
( DAY 1)
Describing Distributions with
Essential Question:
What are the measures of center,
when and how are they used?
To find the mean of a data set
To find the median of a data set
To compare the mean and the median of
data set
∑ - ( capital
letter sigma )
In the formula
for the mean
is short for
“ add them all
The Mean X
To find the mean of a set of
observation, add their values and
divide the number of observations
are x1+ x2,+ x3, ……..+ xn,, their
mean is
x = x1 + x2 + ………+ xn
or in more compact notation
x = 1 ∑ xi
Input the following data sets into
your calculator:
13, 27, 26, 44, 30, 39, 40, 34, 45, 44, 24, 32, 44, 39, 29,
44, 38, 47, 34, 40, and 20.
16,19, 24, 25, 25, 33, 33, 34, 34, 37, 37, 40, 42, 46, 49,
and 73.
54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, and
Warm – up
Describe or write a
formula for the mean and
median of a data set.
The Median M
The median M is the midpoint of a distribution, the
number such that half the observations are smaller and
the other half are larger.
To find the median of a distribution:
Arrange all observations in order of size, from smallest to
If the number of observations n is odd, the median M is
the center observation in the ordered list.
If the number of observations n is even, the median M is
the mean of the two center observations in the ordered
Comparing Mean and Median
The median is a resistant measure and the mean is not.
(resistant means that the measure is not influenced
by extreme observations.)
The mean and median of symmetric observations are
close together.
In a skewed situation the mean is further out in the tail
than the median.
The situation and intensions of the reporter may have an
influence on which measure of center is used.
Measures of Spread
Spread can be thought of as a variability.
Range – The difference between the largest
and smallest observations.
The Quartiles Q1 and Q2
To calculate the quartiles:
Arrange the observations in increasing order and locate
the median M in the ordered list of observations.
The first quartile Q1 is the median of the observations
whose position in the ordered list is to the left of the
location of the overall median.
The third quartile Q3 is the median of the observations
whose position in the ordered list is to the right of the
location of the overall median.
The Interquartile Range
The interquartile range (IQR) is the
distance between the first and third
IQR = Q3 – Q1
Outliers: The 1.5 x IQR Criterion
Call an observation an outlier if it
falls more than 1.5 x IQR is above
the third quartile or below the first
The Five – Number Summary
The five – number summary of a data set consists
of the smallest observation, the first quartile, the
median, the third quartile, and the largest
observation, written in order from smallest to
In symbols, the five – number summary is:
Q1 M
Boxplot ( Modified )
A modified boxplot is a graph of the five – number
summary, with outliers plotted individually.
Box Plot
Box Plot
A central box spans the quartiles.
A line in the box marks the median.
Observations more than 1.5 x IQR outside the
central box are plotted individually.
Lines extend from the box out to the smallest and
largest observations that are not outliers.
Measure of Spread:
( Standard Deviation )
The Variance s2 of a set of observations is the
average of the squares of the deviations of the
observations from their mean. In symbols, the
variance of n observations x1, x2,……,xn is
s2 = ( x1 – x )2 + ( x2 – x )2 +….( xn – x )2
Or, more compactly,
s2 = 1
∑ ( x 1 – x )2
The Standard Deviation
The standard deviation s is the square root of the
variance s2
s= √ 1
∑ ( x1 – x )2
(n-1) – this is referred to as the degree of freedom.
It is found by subtracting one from the number of
elements in a set of data.
Properties of the Standard Deviation
s measures spread about the mean and should be
used only when the mean is chosen as the
measure of center.
s = 0 only when there is no spread. This happens
only when all observations have the same value.
Otherwise, s> 0. As the observations become more
spread out about their mean, s gets larger.
s, like the mean x, is not resistant. Strong skewness
or a few outliers can make s very large.