Download Part 1. For each of the following questions fill

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Part 1. For each of the following questions fill-in the blanks. Each question is worth
2 points.
1.
The bell-shaped frequency curve is so common that if a population has this shape, the measurements
are said to follow a __________ distribution.
ANSWER: NORMAL
2.
With a frequency curve, to figure out what percentage or proportion of the population falls into a
certain range, you have to figure out the __________ under the curve over that range.
ANSWER: AREA
3.
A(n) __________ represents the number of standard deviations the observed value or score falls
above or below the mean.
ANSWER: STANDARD SCORE (OR Z-SCORE)
4.
For any normal curve, almost all of the values will fall within __________ of the mean.
ANSWER: THREE STANDARD DEVIATIONS
5.
A(n) __________ is useful for displaying the relationship between two measurement variables.
ANSWER: SCATTERPLOT
6.
A __________ can be used to represent two or three categorical variables simultaneously.
ANSWER: BAR GRAPH
The __________ between two measurement variables is an indicator of how closely their values fall
to a straight line.
ANSWER: CORRELATION
7.
8.
If there is no linear relationship between two measurement variables, the correlation is __________.
ANSWER: ZERO
9.
A data point that is far removed from the rest of the data is called a(n)__________.
ANSWER: OUTLIER
10. It is very difficult to establish a causal connection between two variables without the use of anything
except a __________.
ANSWER: RANDOMIZED EXPERIMENT
11. A table that displays the number of individuals who fall into each combination of categorical
variables is called a(n) __________ table.
ANSWER: CONTINGENCY
12. When omitting a third variable masks the relationship between two categorical variables, this
phenomenon is called __________.
ANSWER: SIMPSON’S PARADOX
Part 2. For each of the following questions circle the correct response. Each
question is worth 2 points.
13. Suppose you are on a jury in a trial someday. How could you encounter Simpson’s Paradox?
a. You could see data that were collected from two different studies, giving you two different
results.
b. One side could present the data using two variables, and the other side could break the same
data down by a third variable that reverses the direction of the results.
c. One side could use counts to summarize the data, and the other side could use percentages
or rates, reversing the direction of the relationship.
d. All of the above.
ANSWER:
B
14. In which case(s) should you be suspicious of a correlation that is presented?
a. When the data is likely to contain outliers.
b. When the sample size is small.
c. When removing one point in the data set actually reverses the direction of the trend.
d. All of the above
ANSWER:
D
15. Assuming there is a statistical relationship between height and weight for adult females, which of the
following statements is true?
a. If we knew a woman’s height, we could predict her weight.
b. If we knew a woman’s height, we could determine the exact weight for all women with that
same height.
c. If we knew a woman’s height, we could predict the average weight for all women with that
same height.
d. All of the above are true.
ANSWER:
C
16. Most researchers are willing to declare that a relationship is statistically significant if the chances of
observing the relationship in the sample when actually nothing is going in the population are less
than what percent?
a. 5%
b. 50%
c. 95%
d. None of the above
ANSWER:
A
17. Which of the following describes a strong statistical correlation?
a. The value of one measurement variable is always equal to the square of the value of another
measurement variable.
b. One measurement variable has a cause and effect relationship with another measurement
variable.
c. Two measurement variables have a strong linear relationship.
d. All of the above.
ANSWER:
C
18. Suppose the correlation between two measurement variables is −1. Which of the following
statements is not true?
a. As one of the variables increases, the other decreases.
b. The data looks the same as when two variables have a deterministic linear relationship.
c. The correlation between the variables is very weak.
d. All of the above statements are true.
ANSWER:
C
19. Which of the following is not a type of picture for organizing categorical data?
a. A pie chart.
b. A bar graph.
c. A pictogram.
d. A histogram.
ANSWER:
D
20. Which of the following describes the entire area underneath a frequency curve?
a. The entire area is 1 or 100%.
b. The entire area is equal to the total number of individuals in the population.
c. The entire area is equal to the total percentage of individuals in the population with the
measurement being studied.
d. None of the above.
ANSWER:
A
21. Suppose your score on the GRE (Graduate Records Exam) was at the 90th percentile. What does that
mean?
a. You got 90% of the questions right.
b. 90% of the other students scored lower than you did.
c. 10% of the other students scored lower than you did.
d. None of the above.
ANSWER:
B
22. Suppose one individual in a certain population had a z-score of −2. Which of the following is true?
a. This is a good thing because the individual is above average.
b. This individual’s measurement is 2 standard deviations below the mean.
c. This individual’s original measurement was a negative number.
d. All of the above are true.
ANSWER:
B
Part 3. For each of the following questions give a short answer. Use complete
sentences. Each question is worth 2 points.
23. Suppose you took a standardized test and the scores had a bell-shaped distribution. You only need
three pieces of information in order to find your percentile in the population of test scores. What are
those three pieces of information?
ANSWER: 1) YOUR TEST SCORE; 2) THE MEAN OF THE POPULATION OF TEST
SCORES; AND 3) THE STANDARD DEVIATION OF THE POPULATION OF TEST
SCORES.
24. The Empirical Rule says that for a normal curve, approximately 68% of the values fall within 1
standard deviation of the mean in either direction, while 95% of the values fall within 2 standard
deviations of the mean in either direction. Explain why you don’t have twice as many values within 2
standard deviations as you do within 1 standard deviation.
ANSWER: BECAUSE OF THE NORMAL, OR BELL-SHAPED CURVE. THE MAJORITY
(68%) FALL CLOSE TO THE MEAN, WHERE THE “BELL” PART OF THE CURVE IS.
AS YOU MOVE AWAY, YOU GET INTO THE TAILS OF THE CURVE, WHICH
CONTAIN LESS AREA.
25. Name three types of statistical pictures that are used to represent measurement data.
ANSWER: ANY 5 OF THE FOLLOWING ARE OK: 1) HISTOGRAM; 2) STEMPLOT; 3)
LINE GRAPH; 4) SCATTERPLOT; OR 5) BOXPLOT.
26. Name a situation in which a scatterplot is most useful for displaying measurement data.
ANSWER: 1) FOR DISPLAYING THE RELATIONSHIP BETWEEN TWO
MEASUREMENT VARIABLES.
27. Determine whether or not the following statement could be statistically correct. If not, explain why
not. “The correlation between tree diameter and weight of fruit harvested was found to be 2.3.”
ANSWER: NO. CORRELATION MUST BE BETWEEN -1 AND +1.
28. Determine whether or not the following statement could be statistically correct. If not, explain why
not. “We found a strong correlation between gender and political party.”
ANSWER: NO. CORRELATION REFERS TO TWO MEASUREMENT VARIABLES.
29. A number of anomalies can cause misleading correlations. Name two problems that can cause
distortion with correlations.
ANSWER: 1) OUTLIERS CAN SUBSTANTIALLY INFLATE OR DEFLATE THEM; 2)
GROUPS COMBINED INAPPROPRIATELY MAY MASK RELATIONSHIPS.
30. Give an example where a randomized experiment cannot be done, even though we know that is the
best way to try to establish a causal connection between two measurement variables.
ANSWER: ANY REASONABLE ANSWER OK. EXAMPLES: DOES SMOKING CAUSE
LUNG CANCER?
Part 4 Make sure to show all work in the following questions!
31.GRE scores are normally distributed with a mean of 497 and standard deviation
of 115.
a) (4 points) Draw a picture of the GRE scores showing the cut off values for the 99.7% of
scores.
Answer: Picture should show bell shaped curve, centered at 497, the left and right
ends should be marked 152 and 842 (99.7% of the area in within 3 standard
deviations about the mean )
b) (4 points) A student had a GRE score of 687. Find and interpret the standard score for
this student.
z=
687−497
=1.67 The student scored 1.67 standard deviations above the mean.
115
c) (4 points) Use the Empirical Rule to approximate the percentage of students with GRE
scores below 382.
382−497
=−1 and Empirical Rule states that about 68% of all GRE scores
115
will be within 1 standard deviation about the mean, that leaves 32% for the tails, so
16% of all scores are below 382 (because of the symmetry of the normal curve).
z=
32.A regression equation relating study time=X and exam score = Y
(out of 100 points) is: Y= 21+4.5 X
a) ( 2 points) What is the score for 2 hours of study time?
Y=21+4.5(2)=30 points
b) (2 points) How many hours of study is required to get 93 points?
93=21+4.5X so X=(93-21)/4.5= 16 hours
c. (4 points) Explain clearly what meaning does the slope of 4.5 have in this situation.
For every hour increase in study time the test score increases by 4.5 points.
d. (4 points) Would the correlation between study time and exam score be positive or
negative? Explain.
Positive, since Y increases as X increases (slope is >0)
33.A study examined whether giving children in developing countries large doses of
vitamin A will prevent night blindness and subsequently will reduce mortality rate
resulting from night blindness. 25,200 children participated in the study with the
following results: out of 12,991 who received Vitamin A, 101 were dead after 1 year and
out of 12,209 that received Placebo, 130 were dead after 1 year. The rest survived.
a. ( 2 points)Which of the variables is the explanatory variable and which is the response
variable?
Survival (Dead or Alive) is response variable, Treatment (Vitamin A or Placebo) is
explanatory variable.
b. ( 4 points) Construct a contingency table for the data.
SURVIVAL
TREATMENT:
Dead
Alive
total
Vitamin A
101
12890
12991
Placebo
130
12079
12209
total
231
24769
25200
c. ( 3 points) Compute the risk of deaths for each of the two treatment groups (Vitamin A
and Placebo). Keep results to 4 decimal places. Interpret the result.
VIT. A
PLACEBO
101/12991=.00777 = .78% (rounded up)
130/12209=.01065 = 1.1% (rounded up)
The risk of death is smaller for the Vit. A group, but not very much.
d. ( 3 points) Compute the odds of surviving for each of the treatment groups (Vitamin
A and Placebo)
VIT. A :
(1-.0078)/.0078=.9922/.0078 = 127.2 which gives about 127 to 1 (or
compute it as 12890/101 with the same result )
Placebo:
(1-.011)/.011=.989/.011 =89.9 which gives about 90 to 1 (or compute it
as 12079/130 with the same result )
e.( 3 points) Compute the relative risk of dying by Placebo group versus Vitamin A group.
Interpret the result.
.011/.0078=1.41 Risk of dying is 1.41 times grater for Placebo group than for
Vitamin A group.