Download and standardize it using z-scores, we have a special distribution

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Stats: Modeling the World
Chapter 6
The Standard Deviation as a Ruler and The Normal Model
Shifting and Scaling Data
The Rules…
When we shift data by adding or subtracting the same constant to every data value,
measures of location/center change, but measures of spread stay the same.
When we rescale data by multiplying or dividing by the same constant to every data value,
measures of location/center AND measures of spread change.
Who is the best athlete?
How do we compare??
We are going to use something called a z-score to help us make comparisons.
This z-score is simply a measure of “how many standard deviations from the
What can z-scores tell us??
A negative z-score tells us that the data value is below the mean.
A positive z-score tells us that the data value is above the mean
A z-score of 0 tells us a data value is right at the mean.
The further a z-score is from 0, the more unusual it is…
Another Example
Looking at that formula again…
When we convert a data value to a z-scores..
shifting it by subtracting the mean
(which sets the NEW mean at 0)
rescaling by the standard deviation
(which sets the NEW standard deviation to 1)
Standardizing does not change the shape of the distribution!
Standardizing sets the center (mean) at 0.
Standardizing sets the spread (st dev) to 1.
The Normal Model
If we have a distribution that is “bell-shaped”, symmetric, and unimodal, it is possibly
modeled by the Normal Distribution.
N(μ, σ) indicates a Normal model with population mean, μ, and population standard
deviation, σ.
m and s represent the population parameters of our model. X and s
represent the sample’s statistics
Making Connections…
If we take a Normal model (like IQs) and standardize it using z-scores, we
have a special distribution called the
Standard Normal Distribution
The Standard Normal Distribution is a Normal Model of N(0, 1)
The Empirical Rule
Percentile Ranks:
An Example
A forester measured 27 of the trees in a large woods that is up for sale. He found a mean diameter of 10.4
inches and a standard deviation of 4.7 inches. Suppose that these trees provide an accurate description of the
whole forest and that a Normal model applies.
What size would you expect the central 95% of all trees to be?
About what percent of the trees should be less than an inch in diameter?
About what percent of the trees should be between 5.7 and 10.4 inches in diameter?
About what percent of the trees should be over 15 inches in diameter?
Instead of estimating, we can find probabilities for a Normal Model using a table
Are you Normal???
To determine if a dataset follows a Normal model, either:
-- Look at a histogram or stemplot and check for non-Normal features (like gaps, outliers, and
-- Compare your actual data to the Empirical Rule
-- Look at a Normality Plot and check if the plot approximates a diagonal straight line
Assessing Normality
Nearly Normal data have a histogram and a Normal probability plot that look
somewhat like this example:
Non-Normal Data
A skewed distribution might have a histogram and Normal probability plot like
What Can Go Wrong?
Only use the Normal model for symmetric and unimodal distributions!
Be careful of outliers. Remember that the mean and standard deviation are