Download STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES
MIDTERM 1
Answer key
Problem 1
1.1 a)
Note that x is neighborhood median income measured in thousands of dollars,
whereas y is simply the number of reported crimes. Thus, a one unit increase in x
means an increase of 1 thousand dollars in neighborhood median income, which
leads to a 30 (b) decrease in y (number of reported crimes).
1.2 b)
We substitute x by 40 in our linear equation to find the value of y (predicted
number of crimes).
y = 2000 – 30(40) = 800
1.3 d)
Taking the square root of R squared we find:
√0.82 = ±0.90
Since b is negative, we know that the relationship between y and x is negative
and thus our correlation must be negative. Thus, the correlation between y
(neighborhood crime) and x (neighborhood median income) is negative (but not
zero).
1.4 b)
R squared tells us the percentage of variation in y that is explained by our model.
We know that R squared = Explained Sum of Squares/Total Sum of Squares.
Thus, the Explained Sum of Squares is equal to 82% of the Total Sum of Squares.
Problem 2
2.1 e)
X is the number of years of education. Thus, it is a quantitative variable. It is
discrete because it is measured by completed years of education (there is no 1.5
years of education. An individual either completed 2 years or, if they didn’t
complete the second year, they are coded as having completed 1 year).
2.2 c)
The IQR cannot be higher than or equal to 26, since the range of the education
variable is [0,26]. Note the IQR is the 75th percentile – 25th percentile.
The IQR must be lower than or equal to 26.
2.3 c)
If the distribution is right skewed, that means that the median is lower than the
mean (the mean is pulled towards the skew, the longer tail).
2.4 c)
If the sample was drawn via random sampling that means that the sample does
not suffer from sample bias. Response bias could still be present if subjects are
not honest about their level of education. Nonresponse bias could still be present
if individuals drawn in the sample do not respond. Sampling error is present
whenever we use a sample to make inferences about an entire population.
2.5 c)
The mean of x is indeed a statistic since it describes a sample, not a population.
Problem 3
3.1 b) and c)
Nonresponse bias is the only bias that does not affect the study since all students
of the sample answered.
3.2 d)
Out of all of them, d) is the best answer possible. Having 24 friends is 3sd away
from the mean, and thus we can safely say that almost no student had more than
that.
3.3 c)
We know that 68% of the observations fall within the interval of minus 1
standard deviation away from the mean (8 friends), plus 1 standard deviation
away from the mean (16 friends).
3.4 d)
We know that r=sx/syb, thus b = sy/sxr = -0.525
3.5 c)
A sx increase in x (number of friends) is associated with a sxb/sy standard
deviation decline in y (number of hours studied per week)
sxb/sy = 0.7
Problem 4
4.1 e)
The observations are {1, 1, 2, 2, 2, 3, 4}. The median is 2.
4.2 c)
The mode is the stress level with the highest frequency (highest number of rats
with that stress level). That is 2 (3 rats).
4.3 c)
Stress level is measured in numbers from 0 to 4. Thus, it is a quantitative discrete
variable.
4.4 e)
(1𝑥2)+(2𝑥3)+(3𝑥1)+(4𝑥1)
mean =
= 2.143
7
4.5 a)
We use the formula for the standard deviation
𝑁
2
1
𝑆𝐷 = √ ∑(𝑦 − ̅̅̅̅
𝑦𝑖 )
𝑁
𝑖=1
1
= √ [(1 − 2.143)2 + (1 − 2.143)2 + (2 − 2.143)2 + (2 − 2.143)2 + (2 − 2.143)2 + (3 − 2.143)2 + (4 − 2.143)2 ]
7
1
= √ (2.613 + 0.061 + 0.734 + 3.448) = 0.990
7
4.6 a)
We square the standard deviation and find var = (0.990)2 = 0.980
Problem 5
5.1 b)
R squared = (TSS-SSE)/TSS= 0.071
5.2 b)
Sy=(Sx/r)b=(23.8/√0.071)7= 625.24
5.3 b)
Residual = yi - 𝑦̂
We can see that the only observation above 80 is above the regression line, and
thus the residual will be negative.
5.4 d)
5.5 c)
The Sum of Squared Errors (SSE) is 68,948,223. Note that the formula is
̂2
∑𝑁
𝑖=1(𝑦𝑖 − 𝑦𝑖 ) . If we divide it by N and take the square root, we will have the
68,948,223
standard deviation of the error. Thus, the right answer is √
5.6 c)
We want to find bxy
We know that
r (x,y) = (Sx/Sy)byx=(Sy/Sx)bxy
Thus,
bxy = (Sx/Sy)2byx = (23.8/625.24)2(7)= 0.010
𝑁
= 326.95