• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Chapter 2
Turning Data
Into
Information
Some key words:
• sample vs. population
• categorical (ordinal?) vs. quantitative
• explanatory vs. response
• outlier
• mean vs. median
• standard deviation vs. range vs. IQR range
• histogram vs. boxplot
• shape?
2
Symmetric: mean = median
Skewed Left: mean < median
Skewed Right: mean > median
3
4
2.7 Bell-Shaped Distributions
of Numbers
Many measurements follow a predictable pattern:
• Most individuals are clumped around the center
• The greater the distance a value is from the
center, the fewer individuals have that value.
Variables that follow such a pattern are said
to be “bell-shaped”. A special case is called
a normal distribution or normal curve.
5
Example 2.11 Bell-Shaped
British Women’s Heights
Data: representative sample of 199 married British couples.
Below shows a histogram of the wives’ heights with a normal
curve superimposed. The mean height = 1602 millimeters.
6
with Standard Deviation
Standard deviation measures variability
by summarizing how far individual
data values are from the mean.
Think of the standard deviation as
roughly the average distance
values fall from the mean.
7
Calculating the Standard Deviation
Formula for the (sample) standard deviation:
 x  x 
2
s
i
n 1
The value of s2 is called the (sample) variance.
An equivalent formula, easier to compute, is:
s
x
2
i
 nx
2
n 1
8
Population Standard Deviation
Data sets usually represent a sample from a larger
population. If the data set includes measurements for
an entire population, the notations for the mean and
standard deviation are different, and the formula for
the standard deviation is also slightly different.
A population mean is represented by the symbol m
(“mu”), and the population standard deviation is
 x  m 
2

i
n
9
Interpreting the Standard Deviation
for Bell-Shaped Curves:
The Empirical Rule
For any bell-shaped curve, approximately
• 68% of the values fall within 1 standard
deviation of the mean in either direction
• 95% of the values fall within 2 standard
deviations of the mean in either direction
• 99.7% of the values fall within 3 standard
deviations of the mean in either direction
10
The Empirical Rule, the Standard
Deviation, and the Range
• Empirical Rule => the range from the
minimum to the maximum data values equals
about 4 to 6 standard deviations for data with
an approximate bell shape.
• You can get a rough idea of the value of the
standard deviation by dividing the range by 6.
Range
s
6
11
Example 2.11 Women’s Heights (cont)
Mean height for the 199 British women is 1602 mm
and standard deviation is 62.4 mm.
• 68% of the 199 heights would fall in the range
1602  62.4, or 1539.6 to 1664.4 mm
• 95% of the heights would fall in the interval
1602  2(62.4), or 1477.2 to 1726.8 mm
• 99.7% of the heights would fall in the interval
1602  3(62.4), or 1414.8 to 1789.2 mm
12
Example 2.11 Women’s Heights (cont)
Summary of the actual results:
Note: The minimum height = 1410 mm and the maximum
height = 1760 mm, for a range of 1760 – 1410 = 350 mm.
So an estimate of the standard deviation is:
Range 350
s

 58.3 mm
6
6
13
Standardized z-Scores
Standardized score or z-score:
Observed value  Mean
z
Standard deviation
Example: Mean resting pulse rate for adult men is 70
beats per minute (bpm), standard deviation is 8 bpm.
The standardized score for a resting pulse rate of 80:
80  70
z
 1.25
8
A pulse rate of 80 is 1.25 standard deviations
above the mean pulse rate for adult men.