Download Unit 3 – Exploring and Understanding Data (Chapter 6 – The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Unit 3 – Exploring and Understanding Data
(Chapter 6 – The Standard Deviation as a Ruler and the Normal Model)
•
•
When we ___________ or ____________ all the data values by any
constant value, both measures of location (e.g., mean and median) and
Key Terms:
-
Rescaling data:
measures of spread (e.g., range, IQR, standard deviation) are divided and
Shifting –
multiplied by the same value.
-
Rescaling –
Example:
-
Standardizing –
-
-
Standardized Value –
-
Normal Model –
-
Parameter –
-
Statistic –
-
z-score –
-
Standard Normal Model –
-
68-95-99.7 Rule –
-
Normal percentile –
-
Changing center and spread -
Use the following summary statistics to create a new list of summary statistics
that have been multiplied by 2 and then added to by 14.
•
The trick in comparing very different-looking values is to use ___________
____________ as our rulers.
•
The standard deviation tells us how the whole collection of values
varies, so it’s a natural ruler for comparing an individual to a
group.
•
As the most common measure of variation, the standard
deviation plays a crucial role in how we look at data.
•
Shifting data:
•
•
Adding (or subtracting) a ____________ amount to each value just adds
We compare individual data values to their ____________, relative to their
standard deviation using the following formula:
(or subtracts) the same constant to (from) the mean. This is true for the
____________and other measures of position too.
•
In general, adding a constant to every data value adds the same constant
to measures of ____________ and percentiles, but leaves measures of
_____________ unchanged.
•
We call the resulting values ____________ ____________, denoted as z. They
can also be called ____________.
•
Standardized values have no units.
•
z-scores measure the distance of each data value from the mean in standard
Using the data set - 1, 1, 1, 1, 5, 6, 20, 30 do the following:
deviations.
•
EXAMPLE:
A negative z-score tells us that the data value is below the mean,
-
Find the mean
-
Subtract the mean from each data value, what is your new mean?
-
Divide each new value by “s”, what is your new standard deviation?
while a positive z-score tells us that the data value is above the
mean.
•
Standardized values have been converted from their original units to the
standard statistical unit of standard deviations from the mean.
•
Thus, we can compare values that are measured on different scales, with
different units, or from different populations.
•
EXAMPLE:
how far it is from the mean.
Mary took the SAT and scored a 1750. The mean score for the SAT is 1500 with a
standard deviation of 125. Mark took the ACT and scored a 28. The mean score for
the ACT is a 26 with a standard deviation of 1.5. Who did better on their test?
A z-score gives us an indication of how unusual a value is because it tells us
•
•
Anything more than 2 standard deviations from the mean is ___________.
•
The larger a z-score is (negative or positive), the more unusual it is.
There is no universal standard for z-scores, but there is a model that shows up
over and over in Statistics.
•
This model is called the _______________ (You may have heard of “bellshaped curves.”).
•
Normal models are appropriate for distributions whose shapes
are unimodal and roughly symmetric.
•
Standardizing data into z-scores shifts the data by subtracting the mean and
•
•
rescales the values by dividing by their standard deviation.
•
•
•
These distributions provide a measure of how extreme a z-score is.
Remember that a negative z-score tells us that the data value is below the
Standardizing into z-scores does not change the shape of the
mean, while a positive z-score tells us that the data value is above the
distribution.
mean.
Standardizing into z-scores changes the center by making the
•
There is a Normal model for every possible combination of mean and standard
mean 0.
deviation.
Standardizing into z-scores changes the spread by making the
•
standard deviation 1.
We write ________________ to represent a Normal model with a mean of
μ and a standard deviation of σ.
•
We use Greek letters because this mean and standard deviation do not come
•
from data—they are numbers (called ____________________) that specify the
model.
•
•
likely it is to find one that far from the mean.
•
Summaries of data, like the sample mean and standard deviation, are written
with Latin letters. Such summaries of data are called statistics.
Normal models give us an idea of how extreme a value is by telling us how
We can find these numbers precisely, but until then we will use a simple rule
that tells us a lot about the Normal model…
•
It turns out that in a Normal model:
When we standardize Normal data, we still call the standardized value a z-
•
about 68% of the values fall within one standard deviation of the mean;
score, and we write
•
about 95% of the values fall within two standard deviations of the mean;
and,
•
EXAMPLE:
A student calculated that it takes an average of 17 minutes with a standard
about 99.7% (almost all!) of the values fall within three standard deviations
of the mean.
•
The following shows what the 68-95-99.7 Rule tells us:
•
When we use the Normal model, we are assuming the distribution is Normal.
•
We cannot check this assumption in practice, so we check the following
deviation of 3 minutes to drive from home, park the car, and walk to an early
morning class.
a.
One day it took the student 21 minutes to get to class. How many
standard deviations from the average is that?
condition:
b.
Another day it took only 12 minutes for the student to get to class. What is
•
the z-score?
Nearly Normal Condition: The shape of the data’s distribution is unimodal
and symmetric.
•
This condition can be checked with a histogram or a Normal probability
plot (to be explained later).
•
c.
And, when we have data, make a histogram to check the Nearly Normal
Another day it took 17 minutes for the student to get to class. What is the
Condition to make sure we can use the Normal model to model the
z-score?
distribution.
•
When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the
mean, we can look it up in a table of Normal percentiles.
•
Table Z in Appendix E provides us with normal percentiles, but many calculators
and statistics computer packages provide these as well.
Example: What z-score represents the first quartile in a Normal model?
EXAMPLE:
SAT verbal scores are represented by N(500, 100). Sketch the model with the
Empirical Rule labeled. Use it to answer questions that follow.
a.
b.
Between what scores would you expect to find 68% of the data?
What score is the cut off to be in the top 2.5%?
•
Look in Table Z for an area of 0.2500.
•
The exact area is not there, but 0.2514 is pretty close.
•
This figure is associated with z = -0.67, so the first quartile is 0.67 standard
deviations below the mean.
EXAMPLE:
SAT Verbal scores are represented by N(500, 100). What proportion of SAT scores
fall between 450 and 600?
•
Table Z is the standard Normal table. We have to convert our data to z-scores
before using the table.
•
Figure 6.5 shows us how to find the area to the left when we have a z-score of
1.80:
Some colleges only admit students who have an SAT verbal score in the top 10%.
What score would make you eligible?
•
Sometimes we start with areas and need to find the corresponding z-score or
even the original data value.
Find the percent of observations that fall in given intervals:
•
When you actually have your own data, you must check to see whether a Normal
model is reasonable.
-0.7 < z < 2.4
z > 1.8
•
Looking at a histogram of the data is a good way to check that the underlying
distribution is roughly unimodal and symmetric.
•
A more specialized graphical display that can help you decide whether a Normal
model is appropriate is the Normal probability plot.
•
If the distribution of the data is roughly Normal, the Normal probability plot
approximates a diagonal straight line. Deviations from a straight line indicate that
Find the percent of observations that fall in given intervals:
-0.7 < z < 2.4
z > 1.8
the distribution is not Normal.
•
Nearly Normal data have a histogram and a Normal probability plot that look
somewhat like this example:
For each model, find the missing parameter:
1) = 1250, 35% below 1200, σ= ?
•
2)
= 0.64, 12% above 0.70, σ= ?
3) σ= 0.5, 90% above 10.0, = ?
A skewed distribution might have a histogram and Normal probability plot like this: