Download IQL Chapter 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
IQL Chapter 5 – What is Normal?
Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition
5.1 The Normal Shape
THE NORMAL SHAPE
All normal distributions have the same characteristic bell shape, but they can differ in their mean and in their
variation. A normal distribution can be fully described w/just 2 numbers: its mean and its standard deviation
Definition
The normal distribution is a symmetric, bell-shaped distribution with a single peak. Its peak
corresponds to the mean, median, and mode of the distribution. Its variation can be characterized by the
standard deviation of the distribution.
THE NORMAL DISTRIBUTION AND RELATIVE FREQUENCIES
The relative frequency for any range of data values is the area under the curve covering that range of values.
Relative Frequencies and the Normal Distribution
The area that lies under the normal distribution curve corresponding to a range of values on the
horizontal axis is the relative frequency of those values.
Because the total relative frequency must be 1, the total area under the normal distribution curve must
equal 1, or 100%.
WHEN CAN WE EXPECT A NORMAL DISTRIBUTION?
Conditions for a Normal Distribution
A data set that satisfies the following four criteria is likely to have a nearly normal distributions
1. Most data values are clustered near the mean, giving the distribution a well-defined single peak.
2. Data values are spread evenly around the mean, making the distribution symmetric.
3. Larger deviations from the mean become increasingly rare, producing the tapering tails of the
distribution.
4. Individual data values result from a combination of many different factors, such as genetic and
environmental factors.
IQL Chapter 5 – What is Normal?
Page 1
5.2 – Properties of the Normal Distribution
Our friends the Greeks:
µ = Mean
σ = Standard Deviation
The 68-95-99.7 Rule for a Normal Distribution
•
About 68% (more precisely, 68.3%), or just over two-thirds, of the data points fall within 1
standard deviation of the mean.
•
About 95% (more precisely, 95.4%) of the data points fall within 2 standard deviations of the
mean.
•
About 99.7% of the data points fall within 3 standard deviations of the mean.
APPLYING THE 68 – 95 – 99.7 RULE
We can apply the 68-95-99.7 rule to determine when data values lie 1, 2, or 3 standard deviations from the
mean.
For example, suppose that 1,000 students take an exam and the scores are normally distributed with a mean of
m = 75 and a standard deviation of s = 7.
Figure 5.19 A normal distribution of test scores with a mean of 75 and a standard deviation of 7. (a) 68% of the
scores lie within 1 standard deviation of the mean. (b) 95% of the scores lie within 2 standard deviations of the
mean.
IQL Chapter 5 – What is Normal?
Page 2
Identifying Unusual Results
In statistics, we often need to distinguish values that are typical, or “usual,” from values that are “unusual.” By
applying the 68-95-99.7 rule, we find that about 95% of all values from a normal distribution lie within 2 standard
deviations of the mean.
This implies that, among all values, 5% lie more than 2 standard deviations away from the mean. We can use this
property to identify values that are relatively “unusual”:
Unusual values are values that are more than 2 standard deviations away from the µ mean.
STANDARD SCORES
The 68-95-99.7 rule apples only to data values that are 1,2, or 3 standard deviations from the mean. We can
generalize this rule if we know precisely how many standard deviations from the mean (
lies. The number of standard deviations a data values lies above or below the mean (
µ) a particular value
µ) is called its standard
deviation (or z-score), often abbreviated by the letter z. For example:



The standard score of the mean is z = 0, because it is 0 standard deviations from the mean
The standard score of a data value 1.5 standard deviations above the mean is z = 1.5.
The standard score of a data value 2.4 standard deviations below the mean is z = 2.4.
Computing Standard Scores
The number of standard deviations a data value lies above or below the mean is called its standard score (or zscore), defined by
z = standard score =
The standard score is positive for data values above the mean and negative for data values below the mean.
IQL Chapter 5 – What is Normal?
Page 3
STANDARD SCORES AND PERCENTILES
Once we know the standard score of a data value, the properties of the normal distribution allow us to find its
percentile in the distribution. This is usually done with a standard score table, such as Table 5.1
TOWARD PROBABILITY
Suppose you pick a baby at random and ask whether the baby was born more than 15 days prior to his or her
due date. Because births are normally distributed around the due date with a standard deviation of 15 days, we
know that 16% of all births occur more than 15 days prior to the due date (see Example 3).
For an individual baby chosen at random, we can therefore say that there’s a 0.16 chance (about 1 in 6) that the
baby was born more than 15 days early.
In other words, the properties of the normal distribution allow us to make a probability statement about an
individual. In this case, our statement is that the probability of a birth occurring more than 15 days early is 0.16.
This example shows that the properties of the normal distribution can be restated in terms of ideas of
probability.
IQL Chapter 5 – What is Normal?
Page 4
5.3 –The Central Limit Theorem
The Central Limit Theorem
Suppose we take many random samples of size n for a variable with any distribution (not necessarily a
normal distribution) and record the distribution of the means of each sample. Then,
1. The distribution of means will be approximately a normal distribution for large sample sizes.
2. The mean of the distribution of means approaches the population mean, m, for large sample sizes.
3. The standard deviation of the distribution of means approaches for large sample sizes, where s is
the standard deviation of the population.
IQL Chapter 5 – What is Normal?
Page 5