Download Sociology 360

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Part I: True/False and Multiple Choice (2 points each)
True/False questions. Circle true or false following each statement.
1. The median of a density curve is always the point that divides the area under the curve in
half.
True
False
2. Simpson’s paradox results from a variable omitted in the pooled table acting as a lurking
variable.
True
False
3. A low value of r2 indicates that we should not use the regression to describe the
association between the independent and dependent variable.
True
False
4. Lurking variables are one reason we cannot infer causality from an association among
two variables.
True
False
5. The height of a density curve for a range of values gives the proportion of observations
that fall under the density curve for that range of values.
True
False
Part I: Multiple Choice (2 points each)
Circle the correct answer below. There is one correct answer to each question. Also, note that
many questions have “all of the above” or “none of the above” choices.
1. As part of a survey of college students a researcher is interested in the number of
cigarettes smoked per day. She records a 1 if the student does not smoke, a 2 if the
student smokes at least once a week but not every day, and a 3 if the student smokes at
least one cigarette per day, and a 4 if the student smokes more than a pack a day. This
variable is
a) ordered categorical
b) quantitative
c) unordered categorical
d) All of the above.
2. A description of different houses on the market includes the following three variables.
Which of the variables is quantitative?
a) The square footage of the house
b) The monthly electric bill
c) The monthly gas bill
d) All of the above.
2
3. When drawing a histogram it is important to
a) have a separate bin for each observation to get the most informative plot.
b) make sure the heights of the bars exceed the widths of the bins so that the bars are
true rectangles
c) label the vertical axis so that the reader can determine the counts or percent in
each bin
d) make certain the mean and median are contained in the same bin interval, so that
the correct type of skewness can be identified.
4. If a histogram has a bar that is taller than the others then
a) the bar corresponds to the bin containing the most observations
b) this is suggestive of a skewed distribution
c) the bin for this bar should be shortened for the sake of symmetry
d) all of the above.
Use the following to answer questions 5-7:
The following histogram represents the distribution of acceptance rates (percent accepted)
among 25 business schools in 1997. In each bin, the left endpoint is included but not the right.
5. What percent of the schools have an acceptance rate of under 20%?
a) 3%
b) 4%
c)12%
d) 16%
6. What is the approximate width of each bin in this graph?
a) 10
b) 5
c) 3
d) none of the above could plausibly be the width of the bin.
3
7. Which of the following intervals include the median of this distribution?
a) 30 to 40
b) 20 to 30
c) 15 to 25
d) cannot be determined from the information given
Use the following box plot of the exam scores in a statistics class to answer questions 8-10. The
boxplot is drawn per Moore (e.g. not per StataQuest).
90
75
60
45
30
8. Approximately 25% of the students scored below
a) 90
b) 65
c) 75
d) 60
9. The interquartile range of the exam scores is approximately
a) 14
b) 55
c) 65
d) 5
10. The maximum exam score is approximately
a) 75
b) 60
c) 65
d) 90
11. For the density curve below, which of the following is true?
0.0
0.25
0.50
0.75
1.00
4
a) the density curve is symmetric
c) the mean is 0.5
b) the median is 0.5
d) all of the above
12. If removing an observation from a data set would have a marked change on the position
of the regression line fit to the data, the point is called
a) robust
b) a residual
c) influential
d) a response
13. Using data from the fifty states, a researcher calculates the correlation coefficient
between the infant mortality rate (deaths per 1000) X in 1990 in the state versus the
percent 18 year olds in the state in 1990 that graduated from high school. The correlation
between X and Y is r = -0.54. If instead of plotting these variables for each of the fifty
states, we plotted the values of these variables for each county in the United States, we
would expect the value of the correlation r to be
a) exactly the same
b) smaller (closer to zero)
c) + 0.54 (the magnitude is the same, but the sign should change)
d) higher (closer to –1)
14. Consider the following scatterplot.
Y
60
40
20
X
15
20
25
30
The correlation between X and Y
a) is approximately 0.999
b) is approximately 0.8
c) is approximately 0.0
d) cannot be computed because there is an outlier in the plot
5
15. In a statistics class with 136 students, the professor records how much money each
student has in her or his possession during the first class of the semester. The histogram
below shows the data collected.
f
r
e
q
u
e
n
c
y
50
40
30
20
10
0
10
20
30 40
50
60
70
80
90
100
Amount of Money in $
From the histogram, which of the following is true?
a) The mean is much larger than the median.
b) The mean is much smaller than the median.
c) The mean and the median are approximately equal.
d) If is impossible to compare the mean and the median for these data.
16. X and Y are two categorical variables. The best way to determine if there is a relation
between them is to
a) calculate the correlation between X and Y.
b) draw a scatterplot of the X and Y values
c) make a two-way table of the X and Y values
d) all of the above
17. A study of the salaries of full professors at Upper Wabash Tech shows that the median
salary for female professors is considerably less than the median male salary. Further
investigation shows that the median salaries for male and female full professors are about
the same in every department (English, Physics, etc.) of the university. This apparent
contradiction is an example of
a) extrapolation
b) Simpson’s paradox
c) Causation
d) Correlation
6
Part III. Free Response
Answer all questions. In some cases, we will award partial credit for correct parts of a problem
even if the final answer is incorrect. Partial credit will only be given for work that is seen as a
step toward the (correct) final answer. Random facts relating to the problem will not get partial
credit. To get partial credit, you need to show your work.
1. Below are date from Fortune Magazine on the number of research centers in 10 American
cities.
city
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Memphis
Denver
Indianapolis
Los Angeles
Phoenix
San Francisco
Detroit
Minneapolis
Seattle
Orlando
rschctrs
85
302
69
515
121
345
361
235
153
33
Use this data to answer the questions below:
A. What is the five number summary for this data? (5 points)
B. What is the interquartile range? (2 points)
C. If we were to delete Los Angeles from the data, which would change more, the standard
deviation of the variable or the interquartile range? (2 points)
7
2. Below is a stem-and-leaf plot of the percentage of the population Christian among states
in the Northeastern region of the United States. (Hint: the minimum is 36% Christian).
Stem-and-leaf plot for pctchris
3|69
4|
5|599
6|1
a. What is the mean for this data? (3 points)
b. What is the standard deviation for this data? (3 points)
3. Normal distribution problems
a. What proportion of the area under the standard normal curve falls to the right of
z = -.5? (3 points)
b. What proportion of the observations of a standard normal distribution falls
between –1 and 1 standard deviations from the mean? (3 points)
c. A social psychologist has developed a test to measure gregariousness. The test is
normed so that it has a mean of 70 and a standard deviation of 20, and the
gregariousness scores are normally distributed. What percentage of scores are
above 105? (4 points)
d. Scores on the California test of basic skills are normally distributed with mean 50
and standard deviation 25. What is the lowest score you would need on the
California test of basic skills to be in the top 20% of all scores? (4 points)
8
e. On the California test of basic skills (mean 50 and standard deviation 25) what
percentage of scores fall between 35 and 50? (4 points)
4. The graph below is a histogram drawn in StataQuest of age at first marriage for 296
married persons in the general social survey. Each bin is two years wide.
.2
f
r
a
c
t
i
o
n
N
.1
0
10
20
30
40
50
age when first married
a. What proportion of persons in the sample were married at the ages of either 20 or
21? (Give your best guess based on the graph. Close will get full credit. 2 points)
b. The mean age at first marriage in this sample is 21.8. Will the median age of
marriage for the sample be greater than, less than, or equal to 21.8? (2 points)
5. A researcher regresses years of education (dependent) on number of siblings
(independent) using data on individuals from a large survey. She gets the following
regression equation:
9
Ŷ = -.227x + 13.48
a. Explain in one or two sentences what the slope says about the relationship
between number of siblings and years of education. (3 points)
b. A statistics professor has 5 siblings and 20 years of education. What is the
residual for the statistics professor? (4 points)
c. If the standard deviation of the number of siblings variable is 3.0, and the standard
deviation of the years of education variable is 3. 15, what is the correlation
between education and number of siblings? (4 points)
d. Draw the regression line on the graph axes below. (3 points)
20
15
10
5
0
5
10
number of brothers and sisters
15
10
e. Place an “x” on the graph above to show where the statistics professor (of part
b) would appear if graphed on the scatterplot. Then put an “o” on the graph to
show the predicted value for the professor based on the regression. (2 points)
f. When we delete the statistics professor from the regression, the slope of education
changes to -.231. Is the statistics professor acting as an influential observation? (2
points)
6. Below is a crosstabulation based on data from the 1986 general social survey. The two
variables are belief in life after death (based on a survey question) and the education of
the respondent in three categories (less than 11 years of education, 12 years of education,
and 13 or more years of education).
belief in
life after
death
| education
|
|
0/11
12
13+
|
Total
yes
|
___
380
436
|
1116
no
|
86
74
___
|
246
Total
|
386
___
522
|
1362
a. Fill in the missing (blank) frequencies in the table above. (3 points)
b. Percentage the conditional distributions assuming that education is the independent
variable and belief in life after death is the dependent variable. Write the percentages
below the corresponding frequencies above. (4 points)
c. Describe in words the association of the independent and dependent variable (mention
both the direction and strength of relationship). (4 points)
11
12
13
14
Part I: True/False and Multiple Choice (2 points each)
True/False questions. Circle true or false following each statement.
6. The median of a density curve is always the point that divides the area under the
curve in half.
True
False
7. Simpson’s paradox results from a variable omitted in the pooled table acting as a
lurking variable.
True
False
8. A low value of r2 indicates that we should not use the regression to describe the
association between the independent and dependent variable.
True
False
9. Lurking variables are one reason we cannot infer causality from an association
among two variables.
True
False
10. The height of a density curve for a range of values gives the proportion of
observations that fall under the density curve for that range of values.
True
False
Part I: Multiple Choice (2 points each)
Circle the correct answer below. There is one correct answer to each question. Also,
note that many questions have “all of the above” or “none of the above” choices.
18. As part of a survey of college students a researcher is interested in the number of
cigarettes smoked per day. She records a 1 if the student does not smoke, a 2 if
the student smokes at least once a week but not every day, and a 3 if the student
smokes at least one cigarette per day, and a 4 if the student smokes more than a
pack a day. This variable is
a) ordered categorical
b) quantitative
c) unordered categorical
d) All of the above.
19. A description of different houses on the market includes the following three
variables. Which of the variables is quantitative?
a) The square footage of the house
b) The monthly electric bill
c) The monthly gas bill
d) All of the above.
15
20. When drawing a histogram it is important to
a) have a separate bin for each observation to get the most informative plot.
b) make sure the heights of the bars exceed the widths of the bins so that the
bars are true rectangles
c) label the vertical axis so that the reader can determine the counts or
percent in each bin
d) make certain the mean and median are contained in the same bin interval,
so that the correct type of skewness can be identified.
21. If a histogram has a bar that is taller than the others then
a) the bar corresponds to the bin containing the most observations
b) this is suggestive of a skewed distribution
c) the bin for this bar should be shortened for the sake of symmetry
d) all of the above.
Use the following to answer questions 5-7:
The following histogram represents the distribution of acceptance rates (percent
accepted) among 25 business schools in 1997. In each bin, the left endpoint is included
but not the right.
22. What percent of the schools have an acceptance rate of under 20%?
a) 3%
b) 4%
c)12%
d) 16%
23. What is the approximate width of each bin in this graph?
a) 10
b) 5
c) 3
d) none of the above could plausibly be the width of the bin.
16
24. Which of the following intervals include the median of this distribution?
a) 30 to 40
b) 20 to 30
c) 15 to 25
d) cannot be determined from the information given
Use the following box plot of the exam scores in a statistics class to answer questions 810. The boxplot is drawn per Moore (e.g. not per StataQuest).
90
75
60
45
30
25. Approximately 25% of the students scored below
a) 90
b) 65
c) 75
d) 60
26. The interquartile range of the exam scores is approximately
a) 14
b) 55
c) 65
d) 5
27. The maximum exam score is approximately
a) 75
b) 60
c) 65
d) 90
28. For the density curve below, which of the following is true?
0.0
0.25
0.50
0.75
1.00
17
a) the density curve is symmetric
c) the mean is 0.5
b) the median is 0.5
d) all of the above
29. If removing an observation from a data set would have a marked change on the
position of the regression line fit to the data, the point is called
a) robust
b) a residual
c) influential
d) a response
30. Using data from the fifty states, a researcher calculates the correlation coefficient
between the infant mortality rate (deaths per 1000) X in 1990 in the state versus
the percent 18 year olds in the state in 1990 that graduated from high school. The
correlation between X and Y is r = -0.54. If instead of plotting these variables for
each of the fifty states, we plotted the values of these variables for each county in
the United States, we would expect the value of the correlation r to be
a) exactly the same
b) smaller (closer to zero)
c) + 0.54 (the magnitude is the same, but the sign should change)
d) higher (closer to –1)
31. Consider the following scatterplot.
Y
60
40
20
X
15
20
25
30
The correlation between X and Y
a) is approximately 0.999
b) is approximately 0.8
c) is approximately 0.0
d) cannot be computed because there is an outlier in the plot
18
32. In a statistics class with 136 students, the professor records how much money
each student has in her or his possession during the first class of the semester. The
histogram below shows the data collected.
f
r
e
q
u
e
n
c
y
50
40
30
20
10
0
10
20
30 40
50
60
70
80
90
100
Amount of Money in $
From the histogram, which of the following is true?
a) The mean is much larger than the median.
b) The mean is much smaller than the median.
c) The mean and the median are approximately equal.
d) If is impossible to compare the mean and the median for these data.
33. X and Y are two categorical variables. The best way to determine if there is a
relation between them is to
a) calculate the correlation between X and Y.
b) draw a scatterplot of the X and Y values
c) make a two-way table of the X and Y values
d) all of the above
34. A study of the salaries of full professors at Upper Wabash Tech shows that the
median salary for female professors is considerably less than the median male
salary. Further investigation shows that the median salaries for male and female
full professors are about the same in every department (English, Physics, etc.) of
the university. This apparent contradiction is an example of
a) extrapolation
b) Simpson’s paradox
c) Causation
d) Correlation
19
Part III. Free Response
Answer all questions. In some cases, we will award partial credit for correct parts of a
problem even if the final answer is incorrect. Partial credit will only be given for work
that is seen as a step toward the (correct) final answer. Random facts relating to the
problem will not get partial credit. To get partial credit, you need to show your work.
7. Below are date from Fortune Magazine on the number of research centers in 10
American cities.
city
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Memphis
Denver
Indianapolis
Los Angeles
Phoenix
San Francisco
Detroit
Minneapolis
Seattle
Orlando
rschctrs
85
302
69
515
121
345
361
235
153
33
Use this data to answer the questions below:
D. What is the five number summary for this data? (5 points)
E. What is the interquartile range? (2 points)
F. If we were to delete Los Angeles from the data, which would change more, the
standard deviation of the variable or the interquartile range? (2 points)
20
8. Below is a stem-and-leaf plot of the percentage of the population Christian among
states in the Northeastern region of the United States. (Hint: the minimum is 36%
Christian).
Stem-and-leaf plot for pctchris
3|69
4|
5|599
6|1
a. What is the mean for this data? (3 points)
b. What is the standard deviation for this data? (3 points)
9. Normal distribution problems
a. What proportion of the area under the standard normal curve falls to the
right of z = -.5? (3 points)
b. What proportion of the observations of a standard normal distribution falls
between –1 and 1 standard deviations from the mean? (3 points)
c. A social psychologist has developed a test to measure gregariousness. The
test is normed so that it has a mean of 70 and a standard deviation of 20,
and the gregariousness scores are normally distributed. What percentage
of scores are above 105? (4 points)
d. Scores on the California test of basic skills are normally distributed with
mean 50 and standard deviation 25. What is the lowest score you would
need on the California test of basic skills to be in the top 20% of all
scores? (4 points)
21
e. On the California test of basic skills (mean 50 and standard deviation 25)
what percentage of scores fall between 35 and 50? (4 points)
10. The graph below is a histogram drawn in StataQuest of age at first marriage for
296 married persons in the general social survey. Each bin is two years wide.
.2
f
r
a
c
t
i
o
n
N
.1
0
10
20
30
40
50
age when first married
a. What proportion of persons in the sample were married at the ages of
either 20 or 21? (Give your best guess based on the graph. Close will get
full credit. 2 points)
b. The mean age at first marriage in this sample is 21.8. Will the median age
of marriage for the sample be greater than, less than, or equal to 21.8? (2
points)
22
11. A researcher regresses years of education (dependent) on number of siblings
(independent) using data on individuals from a large survey. She gets the
following regression equation:
Ŷ = -.227x + 13.48
a. Explain in one or two sentences what the slope says about the relationship
between number of siblings and years of education. (3 points)
b. A statistics professor has 5 siblings and 20 years of education. What is the
residual for the statistics professor? (4 points)
c. If the standard deviation of the number of siblings variable is 3.0, and the
standard deviation of the years of education variable is 3. 15, what is the
correlation between education and number of siblings? (4 points)
d. Draw the regression line on the graph axes below. (3 points)
20
15
10
5
0
5
10
number of brothers and sisters
15
23
e. Place an “x” on the graph above to show where the statistics professor (of
part b) would appear if graphed on the scatterplot. Then put an “o” on the
graph to show the predicted value for the professor based on the
regression. (2 points)
f. When we delete the statistics professor from the regression, the slope of
education changes to -.231. Is the statistics professor acting as an
influential observation? (2 points)
12. Below is a crosstabulation based on data from the 1986 general social survey. The
two variables are belief in life after death (based on a survey question) and the
education of the respondent in three categories (less than 11 years of education,
12 years of education, and 13 or more years of education).
belief in
life after
death
| education
|
|
0/11
12
13+
|
Total
yes
|
___
380
436
|
1116
no
|
86
74
___
|
246
Total
|
386
___
522
|
1362
d. Fill in the missing (blank) frequencies in the table above. (3 points)
e. Percentage the conditional distributions assuming that education is the
independent variable and belief in life after death is the dependent variable. Write
the percentages below the corresponding frequencies above. (4 points)
f. Describe in words the association of the independent and dependent variable
(mention both the direction and strength of relationship). (4 points)
24
25