Download and standardize it using z-scores, we have a special distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Stats: Modeling the World
Chapter 6
The Standard Deviation as a Ruler and The Normal Model
Shifting and Scaling Data
The Rules…
When we shift data by adding or subtracting the same constant to every data value,
measures of location/center change, but measures of spread stay the same.
When we rescale data by multiplying or dividing by the same constant to every data value,
measures of location/center AND measures of spread change.
Who is the best athlete?
How do we compare??
We are going to use something called a z-score to help us make comparisons.
This z-score is simply a measure of “how many standard deviations from the
mean”
What can z-scores tell us??
A negative z-score tells us that the data value is below the mean.
A positive z-score tells us that the data value is above the mean
A z-score of 0 tells us a data value is right at the mean.
The further a z-score is from 0, the more unusual it is…
Example
Another Example
Looking at that formula again…
When we convert a data value to a z-scores..
shifting it by subtracting the mean
(which sets the NEW mean at 0)
rescaling by the standard deviation
(which sets the NEW standard deviation to 1)
Standardizing does not change the shape of the distribution!
Standardizing sets the center (mean) at 0.
Standardizing sets the spread (st dev) to 1.
The Normal Model
If we have a distribution that is “bell-shaped”, symmetric, and unimodal, it is possibly
modeled by the Normal Distribution.
N(μ, σ) indicates a Normal model with population mean, μ, and population standard
deviation, σ.
m and s represent the population parameters of our model. X and s
represent the sample’s statistics
Making Connections…
If we take a Normal model (like IQs) and standardize it using z-scores, we
have a special distribution called the
Standard Normal Distribution
The Standard Normal Distribution is a Normal Model of N(0, 1)
The Empirical Rule
Percentile Ranks:
0.15th
2.5th
16th
50th
84th
97.5th
99.85th
An Example
A forester measured 27 of the trees in a large woods that is up for sale. He found a mean diameter of 10.4
inches and a standard deviation of 4.7 inches. Suppose that these trees provide an accurate description of the
whole forest and that a Normal model applies.
What size would you expect the central 95% of all trees to be?
About what percent of the trees should be less than an inch in diameter?
About what percent of the trees should be between 5.7 and 10.4 inches in diameter?
About what percent of the trees should be over 15 inches in diameter?
Instead of estimating, we can find probabilities for a Normal Model using a table
Are you Normal???
To determine if a dataset follows a Normal model, either:
-- Look at a histogram or stemplot and check for non-Normal features (like gaps, outliers, and
skewness)
-- Compare your actual data to the Empirical Rule
-- Look at a Normality Plot and check if the plot approximates a diagonal straight line
Assessing Normality
Nearly Normal data have a histogram and a Normal probability plot that look
somewhat like this example:
Non-Normal Data
A skewed distribution might have a histogram and Normal probability plot like
this:
What Can Go Wrong?
Only use the Normal model for symmetric and unimodal distributions!
Be careful of outliers. Remember that the mean and standard deviation are
non-resistant!