Download CH 2 student notes - Princeton High School

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
AP Statistics
Chapter 2
The Normal Distribution
Section 2.1: Describing location in a Distribution
Percentiles and Ogives
• An Ogive can determine the relative standing in a quantitative
distribution.
• Ogive = Relative Cumulative Frequency distribution
• The pth percentile of a distribution is the value where “p” percent of the
observations fall at or below the given percentile value.
• An Ogive graph is a line graph that continuously rises from left to right.
Standard Normal Calculations
•
•
•
•
•
All normal distributions can be measured in units of size σ about the mean
center.
Changing to these units is called standardizing.
Standardized values have no units.
€
µ as
€
Standardizing makes all normal distributions into a single distribution, a
distribution that is still normal.
Standardizing a variable that has any normal distribution produces a new
variable that has the standard normal
distribution.
Z – scores
•
•
Tell how many standard deviations
the observation falls away from the mean.
Z – scores measure the distance of each data point from the mean in standard
deviations.
Ex: Z – score of 1.5 is one and a half standard deviations above
the mean.
Gives the direction from mean as well.
Positive z –score value = lies above the mean
Negative z –score value = lies below the mean
Each z – score has a corresponding proportion of area under the curve assigned
to it. (found on Z – table)
§
•
•
•
•
Example: The height of young women is
approximately normal with µ = 64.5
inches and σ = 2.5 inches.
z=
€
height − 64.5
€
2.5
Linear Transformations
•
•
•
•
Changes the original variable into a new variable
a + bx
a – shifts all observations upward and downward movement
b – changes the size of the unit of measurement
•
Adding “a” to each observation adds “a” to the mean and median but
does not change the spread.
•
Multiplying “b” to each observation multiples “b” the mean, median,
IQR and the standard deviation are multiplied by b.
Section 2.2: Density Curves and the Normal Distributions
•
• Determine definition and properties of density curves.
Apply knowledge of Mean, Median, Standard Deviation, and Quartiles to
density curves
• Determine definition and properties normal distributions
• Use the Empirical Rule with normal distributions
Mathematical Model
• Gives the overall pattern of the data but ignores minor irregularities as
well as any outliers.
• Models do not match reality, they only model it.
• Models are something we can look at and manipulate in order to learn
more about the real world.
• Models of data give us summaries we can learn from and use, even
though they don’t fit each data point exactly.
• It is an idealized description
Density Curve
• Describes the overall pattern of the distribution.
Important property:
• Area exactly 1 underneath the curve
• Area under the curve and above any range of values is the proportion of
all observations that fall in that range.
• Always on or above the horizontal axis
• Deviations from the overall pattern are not described by the curve
• Approximation easy to use and accurate enough for practical use.
Skewness of Density Curves
•
•
Can be skewed to the left, right, or appear normal (but actually they are
not)
Normal curve = approximately symmetric distribution.
Mean and Median of Density Curves
Median
• Of a density curve is the equal – areas point.
• The point with half the area under the curve to its left and the remaining
half of the area to its right.
Quartiles – split the area of the curve into quarters.
Finding the Median
• Easy to spot when it’s an approximately symmetric density curve.
• Difficult to locate on a skewed curve. (Can be found mathematically)
Mean
• Arithmetic average of observations.
• It is the point at which the curve would balance if made of solid material.
Symmetric curve
• Mean and Median are equal
• Lie at the center.
Skewed curve
• The mean is pulled away from the
center in the direction of the long tail.
Normal Distributions
Normal Curves
• Model that shows up over and over in statistics
• Appropriate for distributions whose shapes are unimodal and roughly
symmetric.
• Density curves that are approximately symmetric, single – peaked, and
bell – shaped.
• These curve describe normal distributions of data.
Notation = N (µ,σ )
• Normal Distribution (mean, standard deviation)
• µ = mean = not a numerical summary of the data but part of
the model
€
• σ = standard deviation = not a numerical summary of the
data but part of the model
Properties for all Normal Distributions
§ Same overall shape.
§ Shape is completely determined by mean and standard deviation
§ Normal density curves are described by mean and standard deviation
§ Spread controlled by standard deviation
Normal Distributions
• Good descriptions of some real data.
• Certain data tend to follow a normal curve
o Height of male or females
o Blood Pressure
o Amount student’s sleep per night.
• Good approximations to the results of many kinds of chance outcomes.
• Statistical inference procedures are based on normal distributions.
Give an idea of how extreme a value is by telling us how likely it is
to find one value (or observation) that far from the mean.
Empirical Rule
o All normal distributions follow this rule.
o Using the rule provides approximations and can be used when
appropriate.
o Values can be calculated and is the preferred practice.
In – Class Examples:
In the 2006 Winter Olympics men’s combined event, Jean – Baptiste Grange of
France skied the slalom in 88.46 seconds, about 1 standard deviation faster than
the mean. If a normal model is useful in describing slalom times, about how
many of the 35 skiers finishing the event would you expect skied the slalom
faster than Jean – Baptiste?
Normal Distribution Calculations
•
•
•
Proportion of observations that lie in some range of values can be answered by
finding the area under the curve.
Normal distributions are the same when we standardize, we can find areas
under any normal curve from a single table, a table that gives areas under the
curve for the standard normal distribution.
Z – table = use it to answer any question about proportions of observations in a
normal distribution that has been standardized. (Old Method: we use calculator)
Outline of the method for finding the
proportion of the distribution in any
region:
•
•
•
•
Draw a picture of the distribution
and shade the area of interest under
the curve.
Standardize the given value in
terms of a standard normal variable
”z”. (Find the z – score)
Find the proportion of observations that satisfy the shaded area of interest. (use
calculator)
Answer the question within the context of the problem. That value you find
represents a topic of interest.
Example: Scores on the SAT Verbal test in recent years follow approximately the N (505,
110) distribution. How high must a student score in order to place in the top 20% of all
students taking the SAT? top 5%?
Section 2.2: Using your Calculator for Normal Distributions
• Use the Normalcdf feature on your graphing calculator
• Use the invNorm feature on your graphing calculator
normalcdf() = Normal Cumulative Distribution Feature
•
•
•
DOES NOT display a picture of the normal distribution and the problems
corresponding shaded region
JUST provides the proportion of observations: “specified area under the curve”
Use with nonstandardized information from a given problem.
•
Must specify an interval (which will change depending on the problem)
o normalcdf(observation, upperbound, mean, standard deviation)
o normalcdf(lower bound, observation, mean, standard deviation)
•
To determine the upper/lower bound:
o Select a number very far into the tail of the distribution (over 5 standard
deviations)
Use the calculator’s “normalcdf” function to verify your answers to the
following.
invNorm() feature:
•
•
Calculates the raw data value.
OR it can calculate the standardized normal value (z-score)
o invNorm(proportion of observations, mean, standard deviation)
invNorm (proportion of observations)
•
To determine the upper/lower bound:
o Select a number very far into the tail of the distribution (over 5 standard
deviations)
Use the calculator’s “invNorm” function to verify your answers to the following.
Section 2.2 (again): Assessing Normality
• Use two methods to assess the normality of a distribution.
• Construct a normal probability plot with your calculator
• Interpret a normal probability plot
2 Methods for Assessing Normality
•
Inference procedures in later Chapters are based on the condition
(sometimes assumption) that the population is approximately normally
distributed.
• When normality is a condition of an inference procedure, it must be
shown/stated/calculated how you know the population is approximately
normal.
Method 1 – Apply Empirical Rule using the values for µ and σ
• Make a histogram or stemplot of data
• More exact in the information gathered.
• More time consuming
€
€
• Small data sets rarely fit the Empirical Rule, even when the larger
population is normal.
Method 2 – Normal Probability Plot
• Visual (graphs data on calculator in a plot)
• Provides an assessment of the adequacy of the normal model for a data
set.
• Interpret shape of the plot to assess normality
Method 1:
Method 1: Apply Empirical Rule using the values for µ and σ
•
•
•
•
•
•
Make a histogram or stemplot of data
Put data into a graph (histogram, stemplot, boxplot,
etc.) to verify it is
€
€
approximately bell – shaped and symmetric about the mean.
The graph will show the shape (symmetric or skewness) as well as any
unusual features (outliers, gaps, clusters)
Set up the horizontal axis to reflect X , X ±1s , X ± 2s , X ± 3s .
Compare observations in each interval to the Empirical Rule.
Check to see if the % of observations in each standard deviations is
roughly in-line with the Empirical Rule.
Method 2: Normal Probability Plot (technology toolbox on Page 125)
• Use calculator to make a quick normal probability plot (located in the
same place as a histogram, but choose last graph option)
• Takes each data value and plots it against the z –score expected for that
point to have if the distribution were perfectly normal.
• X-axis = data values (observations)
• Y-axis = z-scores of a perfectly normal distribution.
• When these values match up well, the line is straight and the data appears
to be data from a Normal distribution/model.
• Probability plots can be skewed
• Normal distributions produce roughly a diagonal straight line of the data
set.
Interpreting - Normal Probability Plot
•
•
•
Normal Distribution – plotted points will form roughly a diagonal
straight line
Non - normal Distribution – plotted point will form a nonlinear trend.
Outliers will appear on plot as obvious deviations from the overall
pattern.
Normal Probability Plot
• Use only when the distribution is unimodal and symmetric
• Meaning – CHECK the data quickly in a histogram, stemplot, boxplot, etc.