Download Worksheet 2 Date

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AP Statistics 2011
Unit 2: Worksheet 2
Name:____Key_______________________
Date:_______________
NOTE: The graphs used in this key are screen shots from the calculator so they are not formatted
properly with a complete Title and adequate labels that would be expected on an assessment!
I.
How Can We Assess Normality?
1) List three characteristics of all normally distributed data.
 Graph shows symmetrical, bell-shaped, unimodal distribution.
 Mean = Median
 Empirical Rule applies
2) Is any data perfectly normal? Explain.
No – it is very rare that any real world data would be perfectly normal due to variability and sample
size. Most data would be considered “approximately Normal”.
3) Given a set of data and your answer to questions 1 & 2 above, list at least 2 methods that you
already know that you could use to determine if the data is normal.
 Graph it (always!): Use a histogram, stem & leaf plot or dot plot (small data sets)
and analyze the distribution to see if it is bell-shaped, unimodal and symmetrical
 Determine if the Empirical Rule applies: Count the number of observations that
fall within 1, 2, and 3 standard deviations from the mean and see if they fit the
68, 95, 99.7% pattern
4) Do you think that a large set of data is more likely to be normal than a small set of data? For
example, if we examined the heights of students in our class (a small set of data) and compared
it to the heights of high school students in the United States, would we get a different
distribution?
Yes, a large set of data from a normal population is more likely to be Normal than a
small data set from the same distribution. There is more variability in a small data set.
Smaller data sets have a larger variance and standard deviation.
II.
Assessing Normality
United States 2009 Unemployment Rates: The following chart shows the unemployment rates
in all 50 states from November, 2009. The data is arranged from lowest (North Dakota’s 4.1%)
to highest (Michigan’s 14.7%).
4.1
7.0
8.6
10.6
4.5
7.2
8.7
10.6
5.0
7.4
8.8
10.8
6.3
7.4
8.9
10.9
6.3
7.4
9.1
11.1
6.4
7.8
9.2
11.5
6.4
8.0
9.5
12.3
6.6
8.0
9.6
12.3
6.7
8.2
9.6
12.3
6.7
8.2
9.7
12.7
6.7
8.4
10.2
14.7
6.9
8.5
10.3
7.0
8.5
10.5
5) Plot the data. Use a dotplot, stemplot or histogram. Describe the distribution.
The distribution of unemployment rates is unimodal
and fairly symmetrical with no apparent strong skews
to the left or to the right. The median is
approximately 8.5% and the mean is approximately
8.68%, so both measures of center are fairly
consistent. There are no apparent outliers in the
data. There is a range of approximately 10.6% in the
fifty states’ reported unemployment rates.
___________________________________________________________
6) Does the data follow the Empirical Rule? Complete the table to find out and analyze your
results against what you would expect from the Empirical Rule.
Mean = __8.68___________ Standard Deviation = __2.02__________
Low Value
High Value
Frequency
Percent of Data
Ц +/- 1σ
6.66
10.7
34
68%
Ц +/- 2σ
4.64
12.72
47
94%
Ц +/- 3σ
2.62
14.74
50
100%
(Simply count the number of observations that fall within each category – I’ve color-coded
them above so that it is easy to see).
7) Create a box plot of the data. Describe the distribution.
The box plot shows that the data is fairly symmetrical with no
apparent outliers. The IQR is about 3.3%, showing that 50% of
all states have employment rates between 7% and 10.3%.
7) Does the data appear to be approximately normal? Why or why not? Be sure to explain the
SOCS in your answer.
Yes, the unemployment data from all 50 states appears to be approximately Normal. This is
seen in the histogram and box plot showing virtually symmetrical distributions that are not
skewed, with a single peak at approximately 6.5-8.5%. The mean (8.68%) and the mean (8.5%)
are fairly close, indicating that either could be used as a measure of the center of the data.
There are no apparent outliers in the unemployment data, as seen in the box plot. The IQR is
about 3.3%, showing that 50% of all states have employment rates between 7% and 10.3%. All
unemployment data falls within a range of 10.6% with the lowest unemployment rate at 4.1%
and the highest rate at 14.7%.
A Normal Probability Plot shows each observation (x) plotted against its expected z-score (y).
Perfectly normal data is linear. Remember, however, that virtually no data is perfectly normal, so we
should not overreact to slight variations from normal when we assess normality. Look for a linear
pattern, but don’t overreact to minor wiggles in the plot. Look for shapes that show clear departures
from Normality.
8) Using your calculator, construct a Normal Probability Plot for the Unemployment Rate data.
Describe the data. Is it approximately normal? Why or
why not?
The Normal Probability plot shows that the data is
approximately normal because it follows a linear
pattern. The NPP graphs the unemployment rate on the X
axis against the z-score on the Y axis. There is no strong
variation away from the linear pattern in either direction.
9) Guinea Pig survival times: Scientists conducted an experiment using Guinea Pigs and tracked
their survival times (in days) after they were injected with an infectious bacteria.
43
45
53
56
56
57
58
66
67
73
74
79
80
80
81
81
81
82
83
83
84
88
89
91
91
92
92
97
99
99
100
100
101
102
102
102
103
104
107
108
109
113
114
118
121
123
126
128
137
138
139
144
145
147
156
162
174
178
179
184
191
198
211
214
243
249
329
380
403
511
522
598
a) In your calculator, construct a histogram of the data. Describe the distribution.
The distribution is heavily skewed to the right. The center
is approximately 150 based on the median. The data peaks
between 50 and 100, but the range is 600 days. There are
potential outliers (380, 403, 511, 522, 598 days).
b) Using your calculator, construct a Normal Probability Plot of the data. Describe the plot.
The Normal Probability plot pattern is curved strongly to the right. This further confirms that the data
is right skewed.
c) Is the data approximately normal? Why or
why not?
No, the life expectancy of Guinea Pigs who have been
injected with an infectious disease does not appear to
be Normal, based on the strong skew in the Normal Probability Plot, the histogram which also shows a
strong right skew, and the centers (both median and mean) are not approximately equal.
10) Below is a stem and leaf plot showing NBA Free-Throw Percents.
Key:
Stem Leaf
4
0
= 0.40
3
4
5
6
7
8
9
6
0
0
0
0
0
0
7
1
0
1
0
1
1
7
1
1
1
1
1
1
8
2
1
1
1
1
1
8
2
2
1
1
1
1
3
2
2
1
2
2
4
3
2
2
2
2
4
3
2
2
2
2
5
4
2
2
2
2
6
4
2
2
2
2
6
4
3
2
2
3
6
5
3
3
3
3
7
5
3
3
3
3
8
5
4
3
3
4
8
6
4
4
3
4
8
6
4
4
3
4
9
6
4
4
4
4
9
6
5
4
4
5
9
7
5
5
4
5
9
7
5
5
4
5
9
7
6
5
5
6
8
6
6
5
6
9
6
6
5
6
6
6
6
6
6
6
6
6
6
6
6
6
7
6
6
7
7
7
6
7
7
7
6
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
8
8
7
7
8
8
8
7
8
8
8
7
8
8
8
8
8
8
8
8
9
9
8
8
9
9 9
9 9 9
8 8 9 9 9
9
a) Describe the distribution.
The distribution of NBA free-throw percents appears to left skewed and peaks at 80-89%. The center
is approximately 68 – 70 percent, based on the median which is more resistant to outliers and skewed
data than the mean. The range is approximately 63% based on the range.
b) What would you expect the Normal Probability Plot to look like? Sketch it here.
The NPP would be somewhat curved (skewed) left with most of the data clustered to the right. The
data would look similar to the NPP shown below.
11) Sketch a Normal Probability Plot for each of the following:
Normal Data
Data Skewed Left
Data Skewed Right