Download Pelham Ch 2 Answers to Questions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Ideal Answers to Chapter 2 Questions
QUESTION 2.1a: In the case of these data, the median seems like a much better indicator of central
tendency than does the mode. After all, the mode also happens to be the lowest score in these data. In
contrast, the median or middle score is in close agreement with the mean of 11.8. Unless there are
some real outliers in a set of data, the mean is generally considered the best indicator of what score is
typical in a data set.
QUESTION 2.1b: Yes, if the data were to more closely approximate a normal distribution, the mode and
the median would be more likely to be in agreement. In fact, in a perfectly normal distribution, the
mean, the median and the mode are all exactly the same. The median, for example, is the area precisely
under the peak of the normal curve, meaning that it is the score that occurs with the highest frequency.
QUESTION 2.2a. The mean value for shoe size in my SPSS output file is 11.8. Further, the mode is 10.5,
and the median is 11.5. I pasted my output below.
QUESTION 2.2b. As I suggested above, the mean is probably the best measure of central tendency,
followed closely by the median. If we randomly and repeatedly pulled one of these five Pelham guys out
of a very large hat (with replacement) and wanted to predict his shoe size as accurately as possible
before the selection, the mean would do the best job of minimizing errors of prediction, in the long
term.
QUESTION 2.2c. To have a better sense for whether these men tend to have big feet, it would be nice to
know the population standard deviation for shoe size. If a score of 1.8 sizes above the mean is 2
standard deviations above the mean, we could say that these guys, on average, have really big feet. In
the domain of height, for example, a man who is 2 standard deviations above the mean is roughly 6’5”
(based on a mean of 5’10” and a standard deviation of 3.5 inches). But if the shoe size standard
deviation were as high as 3.6, this would put them only half a standard deviation above the mean,
roughly like saying a guy is 5’11 ½ inches tall, certainly not a shrimp but not a giant.
Unfortunately, we probably cannot get a very good estimate of the population standard deviation for
shoe size based on these five men because they are all related. Assuming foot size has a pretty big
genetic component, this would make for a smaller standard deviation among a group of related people
than for the entire population. This being, said, common sense tells us that Barry has pretty darned big
feet – and Brett is just behind him. Of course, maybe Barry is 6’5”, in which case his feet are probably
perfectly normal for a tall guy.
QUESTION 2.3a. This is one of those interesting cases where the mean, the median and the mode are
not in agreement at all. To begin with the median, which is the most commonly used measure when
extreme outliers are possible, the median is $20.00. So there is a sense in which the average American
reported spending $20 yesterday. In other words, about half the people reported spending less than $20
and half reported spending more. Using the median would prevent a small percentage of people who
spent a huge amount of money yesterday from inflating the value.
But of course, if we are just trying to minimize errors of prediction, the mean would work better. The
mean was $71.10. That’s way bigger than the median, and it must reflect the influence of the 11 people
(out of n=120 who answered the question) who say they spent $200 or more yesterday. One guy even
said he spent $900! (I hope he enjoys the new flat screen Hitachi.) If you think that spending $900 isn’t
all that crazy (every once in a while), then the mean of $71.10 is not so unreasonable.
At the other extreme, our mode was actually $0.00, that is, no money spent at all. Almost 26% of these
respondents report having spent nothing yesterday (once we ignore rent, routine bills, and the other
things they were asked to cross off the list). If you just care about what was most typical (common) for
American consumers, the mode is the most reasonable measure of central tendency. One worry with
accepting the mode as the best measure of spending, though, is that it lacks face validity. Did ¼ of the
sample really spend nothing, or did they just forget about that decaf latte they picked up at Starbucks
before heading home to sleep all day?
Taking the data at face value, I’d have to say that this is one of those cases where there is no single
answer to the question of how much money the typical American spends. There are pros and cons for
each of these three very different answers to the question. Accepting conventional economic wisdom
for measuring central tendency that involves money would probably mean settling on the median and
would make the economists happy. Median income, for example, is almost always preferred by
statisticians and economists over mean income.
QUESTION 2.3b. Fully 40% of survey respondents failed to answer the spending question. This is a very
high figure and it raises worries about non-response bias. Those who were willing and able to report
their spending may have had different spending habits than those who were unwilling or unable to
answer. For example, if seniors, and/or poor people are more likely to be unable or unwilling to answer
the spending question, then our estimate would be an overestimate for the total population. On the
other hand, if most Americans believe that reckless spending is frowned upon, people who spend
extravagant amounts of money might be more likely to refuse to answer the question, which would
mean that, in the absence of any other biases, our estimate would be too low.
To the degree that memory lapses rather than social desirability biases are responsible for nonresponses
to the spending question, elderly respondents might be overrepresented among nonresponders. If
elderly people tend to have lower than average incomes, or more frugal spending habits, this would
mean that our observed values overestimate spending in the total population. Because almost all
survey respondents are willing to report their age, this last possibility could be tested empirically in a
complete data set. In the best of all possible worlds these various sources of bias would all cancel one
another out. However, this lucky state of affairs seems unlikely. In the absence of empirical data on
which kind of people are least willing or able to report their spending, we can only add the caveat that
our measure of daily spending is subject to nonresponse bias.
QUESTION 2.4a. The table below shows the U.S. states with the highest and lowest percentages of
Whites, Latinos, Blacks and Asians in 2007. The table shows that there is tremendous variation across
U.S. states in their ethnic composition. Whereas more than 95% of those living in Maine were White,
less than 25% of those living in Hawaii were White. Further, almost 40% of Hawaiians in 2007 were
Asian.
States with Highest and Lowest Percentages of People in Four Ethnic Groups in 2007
Group
State with Highest Percentage
State with Lowest Percentage
Whites
Maine (95.5%)
Hawaii (24.7%)
Latinos
New Mexico (44.4%)
West Virginia (1.1%)
Blacks
Mississippi (37.2%)
Montana (0.6%)
Asians
Hawaii (39.9%)
MT, WV, WY (tied at 0.7%)
Note. Actual percentages for each ethnic group are listed in parenthesis after each state.
QUESTION 2.4b. (a) According to our indicator of ethnic diversity, Hawaii is the most ethnically diverse
U.S. state and California is the second most diverse. The two states are very similar in that non-Hispanic
Whites are actually a statistical minority in both states (though more dramatically so in Hawaii). Further
each state also has one other ethnic group that is represented in very large numbers: In Hawaii it is
Asians (who make up almost 40% of the population) whereas in California it is Latinos (who make up
36% of the population). African Americans are the only group not to be represented in very large
numbers in either state (about 7% in California and about 3% in Hawaii). It is also worth noting that
whereas Hawaii has very few Blacks and very few Latinos, California is more diverse than Hawaii in the
sense that Whites, Asians, and Latinos all make up at least 10% of the population. On the other hand, by
the simplest possible definition of ethnic diversity, which would be the percentage of non-Hispanic
whites in a U.S. state, Hawaii is the winner hands down.
(b) Using the measure based on the standard deviation, the two least ethnically diverse states are Maine
and Vermont. If we trust Wikipedia (http://en.wikipedia.org/wiki/List_of_U.S._states_by_income), both
of these New England states are roughly in the middle of the pack in per capita income relative to other
U.S. states, and politically speaking, both are “blue” states. Based on our own data, the two states have
strikingly similar (and strikingly homogenous) ethnic compositions. Both are about 95% White, and
about 1% Black, 1% Latino, and 1% Asian. According to census data, one area in which these two
otherwise similar states differ somewhat is in the percentage of residents aged 25 or older with a
bachelor’s degree or higher. Vermont ranks 7th nationally whereas Maine ranks 24th. More
specifically, about 34% of Vermont residents, but only about 27% of Maine residents are college
educated. See http://www.census.gov/statab/ranks/rank19.html
(c) The two states that are the most typical in terms of ethnic diversity are Arkansas and Tennessee.
They rank 25th and 26th, respectively on our ethnic diversity measure. Further in both states, more
than 3/4 of residents are White, about 1/6 are Black. Both have only a small percentage (3-5%) of
Latinos and almost no Asians. The same census data just cited also show that these two states are well
below the national median in the percentage of adults who are college educated. Tennessee ranks 42nd
and Arkansas ranks 48th nationally in the percentage of residents with a college diploma.
Criticism: These results would surely have differed somewhat if we had defined ethnic diversity based
on national origin (e.g., Germans, Nigerians, etc.) rather than focusing on the four most common
American ethnic groups. Further, if we had simply included data from a couple of other ethnic groups
(e.g., Native American Indians, people of mixed ethnicity) the results might have differed appreciably for
at least some states. Alaska and Oklahoma, for example, would have probably climbed higher in the
rankings based on the relatively high percentage of Native American Indians in these states.
Teaching Moment: Instructors may wish to point out that we could improve noticeably on the measure
of ethnic diversity (without having to consult the U.S. Census any further) by creating one additional
ethnic group, namely “all other ethnicities.” This would be a catch-all category for groups such as Native
American Indians, Middle Easterners who do not consider themselves Asians, and people of mixed
ethnicity. Putting all of these very small minorities in one group (by simply subtracting the 4 ethnic
groups listed from 100%) would effectively be all-inclusive and would bypass the statistical snags
associated with using, say, 12 ethnic groups, eight of which would be very tiny in most states.
QUESTION 2.5. Here are the calculations for the margin of error based on an observed sample
percentage of 48% yes responses and a sample size of n= 1,000.
Sp = square root of [.48 x .52/1,000] = .0158
So the margin of error is P + (1.96 x .0158), which is .48 + 0.031. The 95% confidence interval (converted
to a percentage is) thus 44.9-51.1%. With rounding, then, there is about a 95% chance that 45-51% of
Canadians believe that there is intelligent life elsewhere in the universe.
Going through the same calculations for a poll result of 4% rather than 48% yields,
Sp = square root of [.04 x .96/1,000] = .0062
So the margin of error is P + (1.96 x .0062), which is .04 + 0.012. The 95% confidence interval (converted
to a percentage is) thus 2.8-5.2%. Rounding to the nearest percentage point, then, there is roughly a
95% chance that the true percentage of Canadians who think their taxes are too low is between 3 and
5%. We have roughly a one point margin of error here (with a 4% result) instead of a roughly 3 point
margin of error (with a 48% result).
A look at the simple standard deviations of the two sets of 20 scores provided (Set5 and Set 50) clarifies
why margins of error get smaller as a poll result becomes more extreme. The standard deviation of the
scores is simply smaller for more extreme results. Based on my hand calculations, the S.D. for the scores
in Set 50 is 0.500. When I asked SPSS to calculate it using “Descriptives,” by the way, I got .51299. I
assume that this is because SPSS used N-1 to estimate the population standard deviation rather than
just giving the simple descriptive S.D. My hand calculations for the S.D. for Set 5 yielded a much smaller
value of .2179. This much lower value makes sense because 19 of the 20 scores in Set 5 are identical to
one another!
So all else being equal, margins of error are smaller when a sample percentage deviates more from 50%
because the standard deviation associated with the scores gets smaller as you deviate further from an
even, 50%, split.
QUESTION 2.6. The skewness score for the variable “bimodal” was exactly zero (0.00). This was the case
because the scores to the right of the mean (6.0) are a perfect mirror image of the scores to the left of
the mean (e.g., the lowest score of 1.0 is exactly 5 points below the mean, and the highest score of 11.0
is exactly 5 points above the mean). Further, the two modes (3 and 9) are each exactly three points
from the mean.
When I created a new variable whose n = 21 scores were twenty 1s and just one 7, I observed a
skewness score of 4.58, which indicates extreme positive skew. This makes sense because there is one
and only one score (7) that is an extremely positive outlier. In fact, this one outlier is more than 4
standard deviations above the mean of about 1.3.
QUESTION 2.7. The variable “normal” has a kurtosis score of 0.11, which is very close to zero. This
makes sense because “normal” very closely approximates a normal distribution. The ever so slightly
positive kurtosis score means that these scores are ever so slightly clustered (i.e., bunched together) at
the mean. In contrast, the variable “low_kurt” has a low kurtosis score of -1.89. This low score reflects
the fact that this bimodal distribution of scores is heavily clustered at the tails. The mean is 6.0, but
there is only one score of exactly 6 in the distribution whereas there are seven scores of 4 (2 points
below the mean) and seven scores of 8 (two points above the mean). Thus 14 out of 21 or 2/3 of the
scores lie in the extreme tails of the distribution. One might describe this as an “anti-normal”
distribution. The variable “high_kurt,” has a high kurtosis score of +5.68, which is a reflection of the fact
that fully 17 of the 21 scores are exactly the mean score of 6.0. There are exactly two scores below the
mean (4 and 5) and exactly two scores above the mean (7 and 8). Thus, this distribution of scores is very
sharply peaked at the mean. The kurtosis score for “fingers” is even larger. It is +18.44. This much
higher kurtosis score is presumably a reflection of the fact that only two (rather than four) of the 21
scores deviate at all from the median score of 10 fingers.
Supplemental Answer: To see whether kurtosis is more sensitive to the number of scores that deviate
from the central tendency or the amount by which the scores deviate, I created a couple of variables in
which I varied the number of scores that deviate from the mean as well as the degree of that deviation.
The kurtosis statistic was highly sensitive to changes in the number of scores that deviate from the mean
(at least for the extremely limited range of scores I tested) but not sensitive at all to the absolute degree
of deviation for the one or two scores that were different from the central tendency. In fact the n=7
scores in Set A (0, 10, 10, 10, 10, 10, 20) have the same kurtosis score (+3.0) as the n=7 scores in Set B (9,
10, 10, 10, 10, 10, 11). At first blush this seemed odd, but if you first subtract 9 from all of the scores in
Set B and then multiply these new scores by 10 you get the scores in Set A. This observation seems to
support a basic principle of statistics -- which is that if you can make a simple linear transformation of a
set of scores it does not change basic properties of the set of scores. Just as multiply height by two
would not change the correlation between height and weight, making these two transformations (10 *
(x-9)) on the scores in Set A does not seem to change their kurtosis.
This suggests that by adding just a few more scores to the positive and negative tails of a distribution (so
that there is better information about the true variability of the scores), one might see that that the
extremity of the scores does matter for kurtosis. The n=9 scores in Set A’ ( -1, 0, 10, 10, 10, 10, 10, 20,
21) are exactly the same as the scores in Set A except they contain a single score that is one point more
negative and a single scores that is one point more positive than the original low and high scores in Set
A. Although kurtosis dropped substantially (from +3.0 to +1.26), it was still positive. Contrast this with
what happened when I examined the kurtosis of the scores in Set B’. These scores were ( 8, 9, 10, 10,
10, 10, 10, 11, and 12). Note that these scores are exactly the same as the scores in Set B except that
they, too, contain a single score that is one point more negative and a single score that is one point
more positive than the original low and high scores. The kurtosis of this set of scores, however, is
actually slightly negative (-.019), meaning that this distribution is slightly flatter (more like a plateau)
than a perfectly normal distribution. This makes sense because we now know that the scores of 8, 9, 11,
and 12 are all quite close to the mean. The gap between 9 and the mean is thus the same as the gap
between 9 and 8. This is not at all true in the previous example (for Sets A and A’). The gap between
the 20 and the mean is much larger than the gap between 20 and 21.
QUESTION 2.8a. Wickens would clearly say that we should not trust this correlation or its associated p
value. If we were to delete the two people with a very unusual number of fingers, there could be no
correlation by definition because there would be no variance in number of fingers. Further, if we were
to switch the heights of the person with 9.5 fingers and the person with 12 fingers, we would observe a
correlation of r = -.48, p < .05. That’s a significant negative instead of a significant positive correlation!
Another way to think about this is that the highly leptokurtic nature of this distribution means that the
standard deviation for number of fingers is very small (0.46). Thus a score that is only a little higher or a
little lower than the typical score of 10 fingers has tremendous weight in the analysis. In fact, the score
of 12 fingers is more than 4 standard deviations above the mean. Scores in excess of three standard
deviations above the mean are usually treated as potentially worrisome outliers.
There is a sense in which we are essentially dealing with three observations. Assuming the 19
participants with 10 fingers are a representative sample, the 19 height scores of this group should be
normally distributed around the mean height of the population from which they were sampled. This
means that most of the 19 scores should be at or near the mean in height. Thus, we essentially have a
group of 19 scores that are mostly near the mean in height and all of which are precisely at the median
in number of fingers. Then we have only one person with a below average number of fingers and only
one person with an above average number of fingers. If one of these two unusually-fingered people
happens to be tall and one happens to be short, we are guaranteed to observe either a positive or a
negative correlation (as confirmed by the switching demonstration noted above). In fact, even if we
were to make the 9.5 fingered person perfectly average in height (67 inches), the observed correlation
between number of fingers and height would still be r = .49, p = .026. So in essence this entire
correlation is being driven by a single 12-fingered person. Finally, it seems obvious that a correlation
with n=3 (1 df) would not be significant at p = .05. Just to be sure I created a sample that included only 3
of these 21 participants – the person with 9.5 fingers, the person with 12 fingers, and a person of
average height (67 inches) with 10 fingers. Because of the enormous reduction of variation in height
scores among people with 10 fingers, the correlation increased dramatically to r = .966. However, with
only 1 df, even this extremely positive correlation was not significant, p = .166.
QUESTION 2.8b. If numbers of fingers were highly platykurtic and height were normally distributed, I
would not be very worried that one or two observations could create a spurious correlation between
number of fingers and height. In fact, this is a perfect description of the properties of the two variables
that are involved in a traditional independent samples t-test or one way ANOVA. The independent
variable is a categorical variable with 2 or more levels (each of which usually has roughly equal
frequencies), and the dependent measure is assumed to be normally distributed. For example, in the
simple case of gender, if there were 10 men and 11 women in these data, there obviously would be no
reason to worry that a person was an outlier on gender. Further, if we happened to sample one very tall
or very short woman, her height would merely by one of 11 scores rather than the sole score
representing her entire gender group.