Download Chapter 3 Numerically Summarizing Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 3
Numerically Summarizing
Data
3.3
Measures of Central Tendency
and Dispersion from Grouped
Data
EXAMPLE Approximating the Mean from a
Frequency Distribution
The following frequency distribution represents the
time between eruptions (in seconds) for a random
sample of 45 eruptions at the Old Faithful Geyser in
California. Approximate the mean time between
eruptions.
EXAMPLE Computed a Weighted Mean
Bob goes the “Buy the Weigh” Nut store and
creates his own bridge mix. He combines 1
pound of raisins, 2 pounds of chocolate covered
peanuts, and 1.5 pounds of cashews. The raisins
cost $1.25 per pound, the chocolate covered
peanuts cost $3.25 per pound, and the cashews
cost $5.40 per pound. What is the cost per pound
of this mix.
EXAMPLE Approximating the Mean from a
Frequency Distribution
The following frequency distribution represents the
time between eruptions (in seconds) for a random
sample of 45 eruptions at the Old Faithful Geyser in
California. Approximate the standard deviation time
between eruptions.
Chapter 3
Numerically
Summarizing Data
3.4
Measures of Location
The z-score represents the number of standard
deviations that a data value is from the mean.
It is obtained by subtracting the mean from the
data value and dividing this result by the standard
deviation.
The z-score is unitless with a mean of 0 and a
standard deviation of 1.
Population Z - score
Sample Z - score
EXAMPLE Using Z-Scores
The mean height of males 20 years or older is
69.1 inches with a standard deviation of 2.8
inches. The mean height of females 20 years or
older is 63.7 inches with a standard deviation of
2.7 inches. Data based on information obtained
from National Health and Examination Survey.
Who is relatively taller:
Shaquille O’Neal whose height is 85 inches
or
Lisa Leslie whose height is 77 inches.
Answer:

Shaquille O’Neal Z-Score:
(85-69.1)/2.8 =5.67857143
Lisa Leslie
(77-63.7)/2.7 =4.92592593
Because O’Neal Z-Score > Lisa ‘s Z-Score,
We say O’Neal is in a higher position than
Lisa in their Goups.

The median divides the lower 50% of a set of
data from the upper 50% of a set of data. In
general, the kth percentile, denoted Pk , of a
set of data divides the lower k% of a data set
from the upper (100 – k) % of a data set.
Computing the kth Percentile, Pk
Step 1: Arrange the data in ascending order.
Computing the kth Percentile, Pk
Step 1: Arrange the data in ascending order.
Step 2: Compute an index i using the following formula:
where k is the percentile of the data value and n
is the number of individuals in the data set.
Computing the kth Percentile, Pk
Step 1: Arrange the data in ascending order.
Step 2: Compute an index i using the following formula:
where k is the percentile of the data value and n
is the number of individuals in the data set.
Step 3: (a) If i is not an integer, round up to the next
highest integer. Locate the ith value of the data set
written in ascending order. This number represents the
kth percentile.
(b) If i is an integer, the kth percentile is the arithmetic
mean of the ith and (i + 1)st data value.
EXAMPLE
Finding a Percentile
For the employment ratio data on the next slide,
find the
(a) 60th percentile
(b) 33rd percentile
Answer:
A) 60th Percentile
i) the index: I = (60/100)*51 =30.6
30.6 in not an integer, we round it up to 31.
so the data value is 66.1
B) 33rd
i) the index: I =(33/100)*51=16.83
Round it up to 17. So the data value at 17th is
63.6.

Finding the Percentile that Corresponds to a Data
Value
Step 1: Arrange the data in ascending order.
Finding the Percentile that Corresponds to a Data
Value
Step 1: Arrange the data in ascending order.
Step 2: Use the following formula to determine
the percentile of the score, x:
Percentile of x =
Round this number to the nearest
integer.
EXAMPLE Finding the Percentile Rank of a Data
Value
Find the percentile rank of the employment ratio of
Michigan.
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four equal
parts.
• The 1st quartile, denoted Q1, divides the bottom 25%
the data from the top 75%. Therefore, the 1st quartile is
equivalent to the 25th percentile.
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four equal
parts.
• The 1st quartile, denoted Q1, divides the bottom 25%
the data from the top 75%. Therefore, the 1st quartile is
equivalent to the 25th percentile.
• The 2nd quartile divides the bottom 50% of the data
from the top 50% of the data, so that the 2nd quartile is
equivalent to the 50th percentile, which is equivalent to
the median.
The most common percentiles are quartiles.
Quartiles divide data sets into fourths or four equal
parts.
• The 1st quartile, denoted Q1, divides the bottom 25%
the data from the top 75%. Therefore, the 1st quartile is
equivalent to the 25th percentile.
• The 2nd quartile divides the bottom 50% of the data
from the top 50% of the data, so that the 2nd quartile is
equivalent to the 50th percentile, which is equivalent to
the median.
• The 3rd quartile divides the bottom 75% of the data
from the top 25% of the data, so that the 3rd quartile is
equivalent to the 75th percentile.
EXAMPLE Finding the Quartiles
Find the quartiles corresponding to the
employment ratio data.
Checking for Outliers Using Quartiles
Step 1: Determine the first and third quartiles of the
data.
Checking for Outliers Using Quartiles
Step 1: Determine the first and third quartiles of the
data.
Step 2: Compute the interquartile range. The
interquartile range or IQR is the difference between
the third and first quartile. That is, IQR = Q3 - Q1
Checking for Outliers Using Quartiles
Step 1: Determine the first and third quartiles of the
data.
Step 2: Compute the interquartile range. The
interquartile range or IQR is the difference between
the third and first quartile. That is, IQR = Q3 - Q1
Step 3: Compute the fences that serve as cut-off
points for outliers.
Lower Fence = Q1 - 1.5(IQR)
Upper Fence = Q3 + 1.5(IQR)
Checking for Outliers Using Quartiles
Step 1: Determine the first and third quartiles of the
data.
Step 2: Compute the interquartile range. The
interquartile range or IQR is the difference between
the third and first quartile. That is, IQR = Q3 - Q1
Step 3: Compute the fences that serve as cut-off
points for outliers.
Lower Fence = Q1 - 1.5(IQR)
Upper Fence = Q3 + 1.5(IQR)
Step 4: If a data value is less than the lower fence or
greater than the upper fence, then it is considered an
outlier.
EXAMPLE Check the employment ratio data
for outliers.
Q1:13 th—62.9 ;
Q3: 38th—67.2
Q3-Q1=4.3
So (62.9-1.5*4.3, 67.2+1.5*4.3)=(56.45,73.65)
The OUTLIER is 52.7
West
Virginia