Download Describing Quantitative Data with Numbers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Section 1.3
Describing Quantitative Data
with Numbers
Measuring Center
β€’
The Mean
π‘₯ (pronounced β€œx bar”) used for the mean of
a sample.
β€’ πœ‡ (greek letter mu, pronounced β€œmew”) used
for the population mean.
β€’ To find the mean of a set of observations, add
up their values and divide by the number of
observations.
Measuring Center
The Mean
β€’ A more compact notation for the Mean is:
β€’ The Ξ£ (capital Greek letter sigma) in the
formula for the mean is short for sum. The
subscripts on the observations xi are just a
way of keeping the n observations distinct.
They do not necessarily indicate order or any
other special facts about the data.
Example of Calculating the Mean
EX: Refer to the data on travel times to work for the sample of 15
North Carolinians.
β€’ (a) Find the mean travel time for all 15 workers.
β€’ (b) Calculate the mean again, this time excluding the person who
reported a 60-minute travel time to work. What do you notice?
Calculating the Mean Solution
Things to think about…
The previous example illustrates an important
weakness of the mean as a measure of center which is
that the mean is sensitive to the influence of extreme
observations.
These may be outliers, but a skewed distribution that
has no outliers will also pull the mean toward its long
tail. Because the mean cannot resist the influence of
extreme observations, therefore we say that it is not a
Resistant measure of center.
Measuring the Center
The Median
β€’ The median β€˜M’ is the midpoint of a distribution –
the number so that half of the observations are
smaller and the other half are larger.
β€’ To find the Median:
1. Arrange the data in ascending order (from smallest
to largest)
2. If the number of observations is odd, then the
median is the center observation in the ordered list.
3. If the number of observations is even, the Median is
the average of the two center observations in the
ordered list.
Example of Finding the Median
EX: People say that it takes a long time to get to
work in New York State due to the heavy traffic near
big cities. What do the data say? Here are the travel
times in minutes of 20 randomly chosen New York
workers:
10 30 5 25 40 20 10 15 30 20 15 20 85
15 65 15 60 60 40 45
a) Find the median of the travel times. Show your
work.
Comparing Mean and Median
β€’ The mean and median of a roughly symmetric
distribution are close together. If the distribution is
exactly symmetric, the mean and median are
exactly the same. In a skewed distribution, the
mean is usually farther towards the long tail than
the median.
β€’ Also the Mean is not resistant (to outliers and
skewed distributions) while the Median is.
Measuring the Spread
The Range
β€’ Range is one of the most basic ways of measuring spread. It is
simply the difference between the highest and lowest
observations. (denoted: 𝜎π‘₯ for population and 𝑠π‘₯ for sample)
Quartiles and IQR (Interquartile Range)
To calculate the quartiles:
1. Arrange the observations in increasing order and locate the
median M in the ordered list of observations.
2. The first quartile Q1 is the median of the observations whose
position in the ordered list is to the left of the median.
3. The third quartile Q3 is the median of the observations whose
position in the ordered list is to the right of the median.
β€’ The Interquartile Range (IQR), which is the measure of the
range of the middle 50% of the data, is defined as:
IQR = Q3 βˆ’ Q1
Example of finding IQR
EX: Find and interpret
the IQR of the following
data on reported travel
times to work for 20
randomly selected New
Yorkers.
Notes: if the number of observations is odd, you do not include the Median when
calculating the Quartiles. If the number if observations is even you include all
numbers.
Identifying Outliers
An observation is an outlier if it falls more than
1.5 x IQR
above the 3rd quartile or below the 1st quartile
ABOVE: Q3 + (1.5 x IQR)
OR BELOW: Q1 – (1.5 x IQR)
EX: Does the 1.5 × IQR rule identify any outliers for the New
York travel time data? In the previous example, we found that
Q1 = 15 minutes, Q3 = 42.5 minutes, and IQR = 27.5 minutes.
Five-Number Summary
The five number summary for observations includes:
minimum, Q1, Median, Q3, and maximum
The five number summary of a
distribution leads us to a new
type of graph….THE BOXPLOT
(sometimes called a box and
whisker plot)
β€’ A central box is drawn from the
first quartile (Q1) to the third
quartile (Q3).
β€’ A line in the box marks the
median.
β€’ Lines (called whiskers) extend
from the box out to the smallest
and largest observations that are
not outliers.
EX: The 2009 roster of the Dallas Cowboys professional football team
included 10 offensive linemen. Their weights (in pounds) were
338 318 353 313 318 326 307 317 311 311
1.
2.
3.
4.
Find the five-number summary for these data by hand. Show your work.
Calculate the IQR.
Determine if there are any outliers.
Draw a boxplot of the data.
Now lets use our
calculators to
construct the boxplot
for this data!!
The standard deviation sx or Οƒx and variance 𝑠𝑋2 or Οƒ2x
β€’ The standard deviation measures the average
distance of the observations from their mean. It is
calculated by finding the average of the squared
distances and then taking the square root. This
average squared distance is called the variance. In
symbols, the sample standard deviation is given by
Standard Deviation
More important than the details of calculating the
standard deviation are the properties that determine
the usefulness of the standard deviation:
– sx measures spread about the mean and should be used
only when the mean is chosen as the measure of
center.
– sx is always greater than or equal to 0. sx = 0 only when
there is no variability. This happens only when all
observations have the same value.
– sx has the same units of measurement as the original
observations.
– Like the mean, sx is not resistant. A few outliers can
make sx very large.
EX: The heights (in inches)of the five starters on a
basketball team are
67, 72, 76, 76, and 84
a) Find and interpret the Mean.
b) Use your calculator to calculate the standard
deviation.
c) Interpret the meaning of the standard deviation in
this setting.
Numerical Summaries using Calculator
Enter the following data for travel times to work in List1 and List2 on your calculator:
N.C. (L1)
NY (L2)
Record the descriptive statistics for each set of data:
Choosing Measure of Center and Spread
We now have a choice between two descriptions of
the center and spread of a distribution: the median
and IQR, OR 𝑋and sx.
Because 𝑋 and sx are sensitive to extreme
observations, they can be misleading when a
distribution is strongly skewed or has outliers. In
these cases, the median and IQR, which are both
resistant to extreme values, provide a better
summary.