Download Normal Distribution - People Server at UNCW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Normal Distribution
• Recall how we describe a distribution of
quantitative (continuous) data:
– plot the data (stemplot or histogram)
– look for the overall pattern (shape, peaks, gaps) and
departures from it (possible outliers)
– calculate appropriate numerical measures of center and
spread (5-number summary and/or mean & s.d.)
– then we may ask "can the distribution be described by a
specific model?" (one of the more common models for
symmetric, single-peaked distributions is the normal
distribution having a certain mean and standard
deviation)
– can we imagine a density curve fitting fairly closely over
the histogram of the data?
• a density curve is a curve that is always on or above the
horizontal axis (>= 0) and whose total area under the curve is 1
• An important property of a density curve is that areas under the
curve correspond to relative frequencies - see Figures 1.25a and
1.25b below.
Figure 1.25
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
rel. freq=287/947=.303
area = .293
• Note the relative frequency of vocabulary scores <= 6 is roughly
equal to the area under the density curve <= 6.
• We can describe the shape, center and spread of a density curve
in the same way we describe data… e.g., the median of a density
curve is the “equal-areas” point - the point on the horizontal axis
that divides the area under the density curve into two equal (.5
each) parts. The mean of the density curve is the balance point the point on the horizontal axis where the curve would balance if
it were made of a solid material. (See figures 1.26b and 1.27
below)
• For a normal density curve we see the characteristic “bellshaped”, symmetric curve with single peak (at the mean
value ) and spread out according to the standard deviation
() See Figure 1.28 for a picture of and …
• The 68-95-99.7 Rule describes the relationship between 
and . See Figure 1.29… Go over example 1.25-1.26 on
page 59-61 (1.3, 5/10)
Figure 1.29
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
• How many different normal curves are there? Ans: One for
every combination of values of  and …but they all are
alike except for their  and . So we take advantage of this
and consider a process called standardization to reduce all
normals to one we call the Standard Normal Distribution.
• Denote a normal distribution with mean  and standard
deviation  by N(,). Let X correspond to the variable
whose distribution is N(,). We may standardize any value
of X by subtracting  and dividing by  - this re-writes any
normal into a variable called Z whose values represent the
number of standard deviations X is away from its mean. The
standardized value is sometimes called a z-score.
• If X is N(,), then Z is N(0,1), where Z=(X-)/.
• We can find areas under Z from Table A, and these areas
equal the corresponding areas under X.
• Consider Example 1.25 (1.3, 5/10). Let X=height (inches)
of a young woman aged 18-24 years. Then X is ~N(64.5",
2.5").
– What proportion of these women's heights are between 62" and
67"?
– What proportion are above 67"? Below 72"?
– What proportions of these women's heights are between 61" and
66"? NOTE: This cannot be solved by the 68-95-99.7 rule…
– What proportion are below 64.5"? Below 68"?
– What proportion are between 58" and 60"?
– Etc., etc., etc. ….
– What height represents the 90th percentile of this aged woman?
• All problems of this type are solvable by sketching the
picture, standardizing, and doing appropriate arithmetic to
get the final answer…the last question above is what I call
a "backwards problem", since you're solving for an X
value while knowing an area…
• We’ve seen examples of data that seem to fit the normal
model, and examples of data that don’t seem to fit … Because
normality is an important property of data for specific types of
analyses we’ll do later, it is important to be able to decide
whether a dataset is normal or not. A histogram is one way
but a better graphical method is through the normal quantile
plot …
• A simple description of how to draw a normal quantile plot is
given on page 68 (1.3, 10/10) … but for us, a normal quantile
plot is always going to be drawn by software and it will allow
us to assess the normality of our data in the following sense:
– if the data points fall along the straight line (and within the bands on
the plot) then the data can be treated as normal. Systematic deviations
from the line indicate non-normal distributions - outliers often appear
as points far away from the pattern of the points...
– the y-intercept of the line corresponds to the mean of the normal
distribution and the slope of the line corresponds to the standard
deviation of the normal distribution
Normal quantile plot of CO2 – Table 1.6 on page 26
Notice the systematic failure of the points to fall on the line,
especially at the low end where the data is “piled up”. Also,
note the outliers at the high end… Conclusion: Not normal
Normal quantile plot of the IQ scores of 78 7th grades
students - Data in Table 1.9 on page 29
Notice that the data points follow the line fairly well, though
there is a slight curve at the low-middle, indicating more data
than would be expected for a normal. The y-intercept is
around 110 (mean= approx. 110) and the slope is around 10
(s.d. is approx. 10). Conclusion: Normal
• Read section 1.3, paying careful attention to the
examples (especially 1.25-1.32). Work through the
examples yourself to make sure you understand how
they are done!
• Work problems #1.108-1.110, 1.113 (applet), 1.1141.117, 1.119, 1.120-1.139, 1.140-1.142, 1.143, 1.144,
1.148
• Try some of the Chapter 1 exercises (p. 78ff). Be sure
you've worked as many of the exercises in this
chapter as you need to feel comfortable with the
material.
• Find a reasonably sized set of data of interest to you
with at least one quantitative and at least one
categorical variable.