Download sec 2.1 - Glen Ridge Public Schools

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
+
Chapter 2: Modeling Distributions of Data
Section 2.1
Describing Location in a Distribution
The Practice of Statistics, 4th edition - For AP*
STARNES, YATES, MOORE
+
Chapter 2
Modeling Distributions of Data
 2.1
Describing Location in a Distribution
 2.2
Normal Distributions
+ Section 2.1
Describing Location in a Distribution
Learning Objectives
After this section, you should be able to…

MEASURE position using percentiles

INTERPRET cumulative relative frequency graphs

MEASURE position using z-scores

TRANSFORM data

DEFINE and DESCRIBE density curves
One way to describe the location of a value in a distribution
is to tell what percent of observations are less than it.
Definition:
The pth percentile of a distribution is the value
with p percent of the observations less than it.
Example, p. 85
Jenny earned a score of 86 on her test. How did she perform
relative to the rest of the class?
6 7
7 2334
7 5777899
8 00123334
8 569
9 03
Her score was greater than 21 of the 25
observations. Since 21 of the 25, or 84%, of the
scores are below hers, Jenny is at the 84th
percentile in the class’s test score distribution.
Describing Location in a Distribution

Position: Percentiles
+
 Measuring
A cumulative relative frequency graph (or ogive)
displays the cumulative relative frequency of each
class of a frequency distribution.
Age of First 44 Presidents When They Were
Inaugurated
Age
Frequency
Relative
frequency
Cumulative
frequency
Cumulative
relative
frequency
4044
2
2/44 =
4.5%
2
2/44 =
4.5%
4549
7
7/44 =
15.9%
9
9/44 =
20.5%
5054
13
13/44 =
29.5%
22
22/44 =
50.0%
5559
12
12/44 =
34%
34
34/44 =
77.3%
6064
7
7/44 =
15.9%
41
41/44 =
93.2%
6569
3
3/44 =
6.8%
44
44/44 =
100%
+
Relative Frequency Graphs
Describing Location in a Distribution
 Cumulative
Interpreting Cumulative Relative Frequency Graphs

Was Barack Obama, who was inaugurated at age 47,
unusually young?

Estimate and interpret the 65th percentile of the distribution
65
11
47
58
Describing Location in a Distribution
Use the graph from page 88 to answer the following questions.
+

Constructing a
Histogram
+
Comparing data sets
• How do we compare results when they are
measured on two completely different
scales?
• One solution might be to look at
percentiles
• What might you say about a woman that is
in the 50th percentile and a man in the 15th
percentile?
AP Statistics, Section
2.2, Part 1
8
Another way of comparing
• Another way of comparing: Look at
whether the data point is above or below
the mean, and by how much.
• Example: Compare an SAT score of 1080
to an ACT score of 28 given the mean
SAT is 896 with standard deviation of 174
& the mean ACT is 20.6 with a standard
deviation of 5.2
AP Statistics, Section
2.2, Part 1
9
A z-score tells us how many standard deviations from the
mean an observation falls, and in what direction.
Definition:
If x is an observation from a distribution that has known mean
and standard deviation, the standardized value of x is:
x  mean
z
standard deviation
A standardized value is often called a z-score.
Jenny earned a score of 86 on her test. The class mean is 80
and the standard deviation is 6.07. What is her standardized

score?
x  mean
86  80
z

 0.99
standard deviation
6.07
Describing Location in a Distribution

Position: z-Scores
+
 Measuring
+
z-scores for Comparison
We can use z-scores to compare the position of individuals in
different distributions.
Example, p. 91
Jenny earned a score of 86 on her statistics test. The class mean was
80 and the standard deviation was 6.07. She earned a score of 82
on her chemistry test. The chemistry scores had a fairly symmetric
distribution with a mean 76 and standard deviation of 4. On which
test did Jenny perform better relative to the rest of her class?
82  76

4
86  80
zstats 
6.07
zchem
zstats  0.99
zchem  1.5
Describing Location in a Distribution
 Using
Standardizing with z-scores (cont.)
Standardized values have no units.
 z-scores measure the distance of each data value
from the mean in standard deviations.
 A negative z-score tells us that the data value is
below the mean, while a positive z-score tells us
that the data value is above the mean.
 Depending on the situation, it may be desirable to
either have a positive or negative z-score:
Grades on a test or Finishing times in a race

Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Benefits of Standardizing


Standardized values have been converted from
their original units to the standard statistical unit of
standard deviations from the mean.
Thus, we can compare values that are measured
on different scales, with different units, or from
different populations.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
When Is a z-score BIG?




A z-score gives us an indication of how unusual a
value is because it tells us how far it is from the
mean.
A data value that sits right at the mean, has a zscore equal to 0.
A z-score of 1 means the data value is 1 standard
deviation above the mean.
A z-score of –1 means the data value is 1
standard deviation below the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
When Is a z-score BIG?



How far from 0 does a z-score have to be to be
interesting or unusual?
There is no universal standard, but the larger a zscore is (negative or positive), the more unusual it
is.
Remember that a negative z-score tells us that
the data value is below the mean, while a positive
z-score tells us that the data value is above the
mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
CHEBYSHEV’S INEQUALITY

There is also an interesting result that describes
the percent of observations in any distribution that
fall within a specified number of standard
deviations of the mean. It is known as
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
CHEBYSHEV'S THEOREM for k = 2
According to Chebyshev’s Theorem, at
least what fraction of the data falls
within “k” (k = 2) standard deviations of
the mean?
1
3
At least 1  2 2  4  75 %
of the data falls within 2 standard deviations of
the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
CHEBYSHEV'S THEOREM for k = 3
According to Chebyshev’s Theorem, at
least what fraction of the data falls
within “k” (k = 3) standard deviations of
the mean?
1
8
At least 1  3 2  9  88 . 9 %
of the data falls within 3 standard deviations of
the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
CHEBYSHEV'S THEOREM for k =4
According to Chebyshev’s Theorem, at
least what fraction of the data falls
within “k” (k = 4) standard deviations of
the mean?
1
15
At least 1  4 2  16  93 . 8 %
of the data falls within 4 standard deviations of
the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Using Chebyshev’s Theorem
A mathematics class completes an examination
and it is found that the class mean is 77 and the
standard deviation is 6.
According to Chebyshev's Theorem, between
what two values would at least 75% of the
grades be?
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
In Summary


For specific values of k Chebyshev’s Rule reads
At least 75% of the observations are within 2 standard
deviations of the mean.





At least 89% of the observations are within 3 standard
deviations of the mean.
At least 90% of the observations are within 3.16 standard
deviations of the mean.
At least 94% of the observations are within 4 standard
deviations of the mean.
At least 96% of the observations are within 5 standard
deviations of the mean.
At least 99% of the observations are with 10 standard
deviations of the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.

Transforming Data
Effect of Adding (or Subracting) a Constant
Adding the same number a (either positive, zero, or negative) to each
observation:
•adds a to measures of center and location (mean, median,
quartiles, percentiles), but
•Does not change the shape of the distribution or measures of
spread (range, IQR, standard deviation).
n
Example, p. 93
Mean
sx
Min
Q1
M
Q3
Max
IQR
Range
Guess(m)
44
16.02
7.14
8
11
15
17
40
6
32
Error (m)
44
3.02
7.14
-5
-2
2
4
27
6
32
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Describing Location in a Distribution
Transforming converts the original observations from the original
units of measurements to another scale. Transformations can affect
the shape, center, and spread of a distribution.

Transforming Data
•multiplies (divides) measures of center and location by b
•multiplies (divides) measures of spread by |b|, but
•does not change the shape of the distribution
n
Example, p. 95
Mean
sx
Min
Q1
M
Q3
Max
IQR
Range
Error(ft)
44
9.91
23.43
-16.4
-6.56
6.56
13.12
88.56
19.68
104.96
Error (m)
44
3.02
7.14
-5
-2
2
4
27
6
32
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Describing Location in a Distribution
Effect of Multiplying (or Dividing) by a Constant
Multiplying (or dividing) each observation by the same number b
(positive, negative, or zero):
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Density Curve
So far our strategy for exploring
data is
 Graph the data to get an idea of
the overall pattern
 Calculate an appropriate
numerical summary to describe
the center and spread of the
distribution
Sometimes the overall pattern of a
large number of observations is
so regular, that we can describe
it by a smooth curve, called a
density curve
AP Statistics, Section 2.1,
Copyright © 2010, 2007, 2004 Pearson Education,
Inc.1
Part
25
A Density Curve is an idealized
description of a distribution of data

Characteristics
 Always
above the x-
axis
 Area always equal to 1

The area under the
curve and above any
range of values is the
proportion of all
observations that fall
in that range.
AP Statistics, Section 2.1, Part 1
26
Our measures of center and spread apply to density curves as
well as to actual sets of observations.
Distinguishing the Median and Mean of a Density Curve
The median of a density curve is the equal-areas point, the
point that divides the area under the curve in half.
The mean of a density curve is the balance point, at which the
curve would balance if made of solid material.
The median and the mean are the same for a symmetric density
curve. They both lie at the center of the curve. The mean of
a skewed curve is pulled away from the median in the
direction of the long tail.
Describing Location in a Distribution

Density Curves
+
 Describing
TYPES OF DENSITY CURVES
Since a Density Curve is an idealized
description of the data, we classify the
types of density curves by the shapes of
the distributions they represent.
Finding the area under the density curve is
like finding the proportion of data values
on a particular interval
The time it takes for students to drive to
school is evenly distributed with a
minimum of 5 minutes and a range of 35
minutes. What is the height of the rectangle?
a)Draw the distribution
Where should the rectangle end?
1/35
5
40
b) What is the probability that it takes
less than 20 minutes to drive to
school?
P(X < 20) =
(15)(1/35) = .4286
1/35
5
40
Uniform Distribution
Is a continuous distribution that is
evenly (or uniformly) distributed
 Has a density curve in the shape of a
rectangle
EX: The Citrus Sugar Company packs sugar

in bags labeled 5 pounds. However, the
packaging isn’t perfect and the actual weights
are uniformly distributed with a mean of 4.98
pounds and a range of .12 pounds.
a)Constructing the the uniform distribution we draw a
rectangle centered at the mean of 4.98 extending
.06 in either direction.
What is the height of this rectangle?
What shape does a uniform
distribution have?
How long is this rectangle?
1/.12
4.92
4.98
5.04

What is the probability that a
randomly selected bag will weigh
more than 4.97 pounds?
P(X > 4.97) =
.07(1/.12)
.5833
What is =the
length of the shaded region?
1/.12
4.92
4.98
5.04

Find the probability that a
randomly selected bag weighs
between 4.93 and 5.03 pounds.
P(4.93<X<5.03) =
What is =
the
length of the shaded region?
.1(1/.12)
.8333
1/.12
4.92
4.98
5.04
+
Section 2.1
Describing Location in a Distribution
Summary
In this section, we learned that…

There are two ways of describing an individual’s location within a
distribution – the percentile and z-score.

A cumulative relative frequency graph allows us to examine
location within a distribution.

It is common to transform data, especially when changing units of
measurement. Transforming data can affect the shape, center, and
spread of a distribution.

We can sometimes describe the overall pattern of a distribution by a
density curve (an idealized description of a distribution that smooths
out the irregularities in the actual data).
Our measures of center and spread apply to density curves as
well as to actual sets of observations.
Distinguishing the Median and Mean of a Density Curve
The median of a density curve is the equal-areas point, the
point that divides the area under the curve in half.
The mean of a density curve is the balance point, at which the
curve would balance if made of solid material.
The median and the mean are the same for a symmetric density
curve. They both lie at the center of the curve. The mean of
a skewed curve is pulled away from the median in the
direction of the long tail.
Describing Location in a Distribution

Density Curves
+
 Describing