Download Notes - Normal Model and z scores

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Sec 4.4 Notes – Interpreting Center and Variability: Chebyshev’s
Rule, the Empirical Rule, and z Scores
*Using standard deviation as a ruler allows us to compare values that are measured on different
variables, with different scales, with different units, or for different populations
Chebyshev’s Rule:
 Applicable to all distributions
 Notice “at least” gives room for
error (could be much more)
 Since data can be skewed, it is
not appropriate to divide the
percentages in half based on
the mean
The Empirical Rule (68-95-99.7 Rule):
 Normal models give us an idea of how extreme a value is by telling us how likely
it is to find one that far from the mean
 68% of the values fall within 1 standard deviation of the mean; 95% of the values
fall within 2 standard deviations of the mean; and 99.7% of values fall within 3
standard deviations of the mean
**Just Checking: pg. 109
As a group, the Dutch are among the tallest people in the world. The average
Dutch man is 184 cm tall – just over 6 feet (and the average Dutch woman is 170.8 cm
tall – just over 5’7”). If a Normal model is appropriate and the standard deviation for
men is about 8 cm, what percentage of all Dutch men will be over 2 meters (6’6”) tall?
Suppose it takes you 20 minutes, on average, to drive to school, with a standard
deviation of 2 minutes. Suppose a Normal model is appropriate for the distribution of
driving times.
a)
b)
c)
d)
How often will you arrive at the school in less than 22 minutes?
How often will it take you more than 24 minutes?
Do you think the distribution of your driving times is unimodal and symmetric?
What does this say about the accuracy of your predictions? Explain.
**When working with Normal models, always draw a picture!!!
**And don’t forget about checking the Nearly Normal Condition!!!
**Step-by-Step: pg. 110-111 - BVD
DO THIS!!!
Sketch Normal models using the 68-95-99.7 Rule:
 Birthweights of babies, N(7.6 lb, 1.3 lb)
 ACT scores at a certain college, N(12.2, 4.4)
Z-Scores:
z-scores (or standardized values) –





use the mean and standard deviation to compare data with different units
tells us how many standard deviations above or below the mean a data value is
have no units
+ z-scores = above the mean; - z-scores = below the mean
the bigger the absolute value of a z-score (further away from the mean), the more
unusual the data value is
**Just Checking – pg. 104
Your Statistics teacher has announced that the lower of your two tests will be
dropped. You got a 90 on test 1 and an 80 on test 2. You’re all set to drop the
80 until she announces that she grades “on a curve.” She standardized the
scores in order to decide which is the lower one. If the mean on the first test is
88 with a standard deviation of 4 and the mean on the second was a 75 with a
standard deviation of 5.
a) Which one will be dropped?
b) Does this seem “fair”?
Shifting Data:
 Adding (or subtracting) a constant to each value, all measures of position (center,
percentiles, min, max) will increase (or decrease) by the same constant
 The distribution just shifts, the shape and spread are not affected
Rescaling Data:
 When we multiply (or divide) all the data values by any constant, all measures of
position (such as the mean, median, and percentiles) and measures of spread (such
as the range, the IQR, and the standard deviation) are multiplied (or divided) by that
same constant
**Just Checking – pg. 106
In 1995 the Educational Testing Service (ETS) adjusted the scores of the SAT
tests. Before ETS re-centered the SAT Verbal test, the mean of all test scores
was 450.
a) How would adding 50 points to each score affect the mean?
b) The standard deviation was 100 points. What would the standard deviation
be after adding 50 points?
c) Suppose we drew boxplots of test takers’ scores a year before and a year
after the re-centering. How would the boxplots of the two years differ?
A company manufactures wheels for roller blades. The diameter of the wheels
has a mean of 3 inches and a standard deviation of 0.1 inches. Because so
many of its customers use the metric system, the company decided to report
their production statistics in millimeters (1 inch = 25.4 mm). They report that the
standard deviation is now 2.54 mm. A corporate executive is worried about this
increase in variation. Should they be concerned? Explain.
Z-scores, again…
 When finding the z-score we shift them by the mean and rescale them by the
standard deviation
 subtracting the mean of the data from each value shifts the mean of the distribution
to 0
 Dividing each value by the standard deviation, also divides the standard deviation,
which makes the new standard deviation 1.
 Shape: unchanged
 Center: y  0
 Spread: s  1
**Step-by-Step: pg. 107 - BVD
Normal Model:
 Appropriate for distributions whose shapes are unimodal and roughly symmetric
 Written N (  ,  ) to represent a Normal model with a mean of  and a standard
deviation of 
 The symbols are in Greek because they are not numerical summaries of data; they
are part of the model. We call these parameters
 For the Normal model:
z
y

 Standard Normal model (or standard Normal distribution) – the Normal model
with mean 0 and standard deviation 1
 DATA MUST BE UNIMODAL AND SYMMETRIC!!!
 Nearly Normal Condition – the shape of the data’s distribution is unimodal and
symmetric. Check this by making a histogram, or a Normal probability plot
(explained later)
 NEVER use the model without checking whether the condition is satisfied.
Examples –
1. Suppose the class took a 40-point quiz. Results show a mean score of 30,
median 32, IQR 8, SD 6, min 12, and Q1 27. (Supposed YOU got a 35.) What
happens to each of the statistics if…
 I decide to weight the quiz as 50 points, and will add 10 points to every
score. Your score is now 45.
 I decide to weight the quiz as 80 points, and double each score. Your score
is now 70.
 I decide to count the quiz as 100 points; I’ll double each score and add 20
points. Your score is now 90.
Statistic
Mean
Median
IQR
SD
Minimum
Q1
Your score
Original
y+10
2y
2y+20
2. Let’s talk about scoring the decathlon. Silly example, but suppose two
competitors tie in each of the first eight events. In the ninth event, the high jump,
one clears the bar 1 in. higher. Then in the 1500-meter run the other one runs 5
seconds faster. Who wins? It boils down to knowing whether it is harder to jump
an inch higher or run 5 seconds faster. We have to be able to compare two
fundamentally different activities involving different units. Standard deviations to
the rescue! If we knew the mean performance (by world-class athletes) in each
event, and the standard deviation, we could compute how far each performance
was from the mean in SD units (called z-scores). So consider the three athletes’
performances shown below in a three event competition. Note that each placed
first, second, and third in and event. Who gets the gold medal? Who turned in
the most remarkable performance of the competition?
Events
Competitor
100 m Dash
Shot Put
Long Jump
A
10.1 sec
66’
26’
B
9.9 sec
60’
27’
C
10.3 sec
63’
27’3”
Mean
10 sec
60’
26’
St Dev
0.2 sec
3’
6”
Finding Normal Percentiles by Hand:
*Use when you don’t have a calculator
 Z Table – pg. A78-A79
 When the value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean,
look it up in a table of Normal percentiles (Z table)
 Convert to z-scores before using the table
 Find the first two digits in the vertical column to the right (or left)
 Find the third digit in the top row
 The table gives the percent of data to the left (below) the z-score
 If you are looking for the percent of data below or above, you will have to subtract to
get the value you want
Finding Normal Percentiles Using the Calculator:
**Step-by-Step: pg. 113
From Percentiles to z-scores:
 Sometimes you want to know what the cutoff value is for a certain percentile (i.e.
What SAT score would you need to score in the 90th percentile?)
 Using the table:
o Find the area (percent) in the table
*for problems about things like the “top 15%” you will need to look up 1-%
(.8500)
o If you can’t find the exact value, take the closest one
o Then look to the side and top to get the z-score
o You often need to convert the z-score back to a raw data value (just plug
the z-score into the equation and solve for y)
**Step-by-Step: pg. 114-117
Normal Probability Plot:
 This is a way to check the Nearly Normal Condition
 If the distribution of the data is roughly Normal, the plot is roughly a diagonal straight
line.
 Deviations from a straight line indicate that the distribution is not Normal
 This plot is usually able to show deviations from Normality more clearly than the
corresponding histogram, but it’s usually easier to understand how a distribution fails
to be Normal by looking at its histogram
*sometimes you will need to look at both
**TI Tips: pg. 119
 STATPLOT On choose the last graph icon specify your datalist and which axis
you want the data on (often Y)
 Specify the Mark you want the plot to use
 Hit ZoomStat
Suggested Practice (New Book):
pg. 184 #4.38, 4.39, 4.41-4.44, 4.46, 4.48, 4.50, 4.51, 4.52
Suggested Practice (Old Book):
Ch. 6 #2, 6, 10- 22 even, 26-38 even, 42, 46