Download Problem Set I Key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Problem Set I: Review
Intro, Measures of Central Tendency & Variability,
Z-scores and the Normal Distribution, Correlation,
and Regression
QUESTION 1: Short answer: What is a statistic? Give a definition and an example.
Explain how the example illustrates the definition you have provided.
A statistic is a number that organizes, summarizes, and makes understandable
a collection of data. An example of a statistic is the mean.
The mean is a single number calculated on a set of data which gives an idea of
the collection of values without having to report them all individually.
QUESTION 2: A psychologist interested in the dating habits of undergraduates
in
SS
s

the Psychology major samples 10 students and determines theNnumber
of dates
1
they have had in the last six months. He knows that the mean number of dates is
7.8, and the sum of squares (SS) is 223.2. Assume a normal distribution.
223.2
s

In order to make ANY conclusions about
the z-table.
_ proportions, we need to use
9
To_use z-scores, we need the mean (x) and standard deviation (s).
x = 7.8
s  4.98
s = 4.98
A. What percentage of all undergraduate students went on less than 4 dates in the last
six months?
After shading in the distribution, it’s clear that “less than 4 dates” refers to an AREA C
Turn 4 into a z-score:
(4-7.8)/4.98 = -.76
The AREA C for z= .76 is .2236, or 22.36%
B. If the psychologist had 10 students total, approximately how many of these students
went on between 8 and 13 dates in the last six months?
QUESTION 2: A psychologist interested in the dating habits of undergraduates
in
SS
s

the Psychology major samples 10 students and determines theNnumber
of dates
1
they have had in the last six months. He knows that the mean number of dates is
7.8, and the sum of squares (SS) is 223.2. Assume a normal distribution.
223.2
s

In order to make ANY conclusions about
the z-table.
_ proportions, we need to use
9
To_use z-scores, we need the mean (x) and standard deviation (s).
x = 7.8
s  4.98
s = 4.98
A. What percentage of all undergraduate students went on less than 4 dates in the last
six months?
After shading in the distribution, it’s clear that “less than 4 dates” refers to an AREA C
Turn 4 into a z-score:
(4-7.8)/4.98 = -.76
The AREA C for z= .76 is .2236, or 22.36%
B. If the psychologist had 10 students total, approximately how many of these students
went on between 8 and 13 dates in the last six months?
After shading in the distribution, it’s clear that “between 8 and 13” refers to an portion
of the distribution which we can only find by combining areas from the table.
Turn 13 into a z-score: (13-7.8)/4.98 = 1.04
The AREA B for z= 1.04 is .3508
Turn 8 into a z-score: (8-7.8)/4.98 = .04
The AREA B for z= .04 is .0160
.3508 - .0160 = .3348 and so 10(.3348) = 3.35 students (approximately 3)
QUESTION 2: A psychologist interested in the dating habits of undergraduates in
the Psychology major samples 10 students and determines the number of dates
they have had in the last six months. He knows that the mean number of dates is
7.8, and the sum of squares (SS) is 223.2. Assume a normal distribution.
In order to make ANY conclusions about
_ proportions, we need to use the z-table.
To_use z-scores, we need the mean (x) and standard deviation (s).
x = 7.8
s = 4.98
C. What is the number of dates one must have gone on in the last six months in order to
be in the top 2.5%?
After shading in the distribution, it’s clear that the top 2.5% is in the right tail of the
distribution, and extends from some z-score and beyond (this means it’s an AREA C).
We need the z-score which has an AREA C closest to .0250 without going over.
We find an AREA C which is EXACTLY .0250 for a z-score of 1.96.
We need to turn this into a raw score, in other words, a number of dates.
_
Raw Score = x + z(s)
Raw Score = 7.8 + 1.96(4.98)
Raw Score = 7.8 + 9.76
Raw Score = 17.56 dates
Raw Score Method
QUESTION 3: A researcher in a learning laboratory believes that the amount of
 xrat
 performs
y
water a rat drinks before entering a maze will affect how 
well
in
xy the
n
r
the maze. He records the amount of water consumed
by each of
his 4 rats (in
2
2






x
y  


2
2



ounces) and then puts them each into a maze and records
each
x  how long
 yittakes

n 
n 



rat to complete the maze (in seconds). He then calculates
the correlation
coefficient between these two variables, which is .48. His data can be found
1442
below:
161

Water consumed (oz)
Maze Completion Time (sec)
4
2
7
1
7
8
15
12
r
4
2
2



14 
42 
 70 
 482 




4 
4 

A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed
161  147
come out to be .48.
r
x
water
4
2
7
1
Sums
y
time
7
8
15
12
x2
16
4
49
1
y2
49
64
225
144
xy
28
16
105
12
70  49482  441
14
2141
r
14
B. Write out the equation of the regression line for predicting maze performance
29.34 from amount of
14
42
70
482
161
r
water consumed.
r  .48
QUESTION 3: A researcher in a learning laboratory believes that the amount of
water a rat drinks before entering a maze will affect how well the rat performs in
the maze. He records the amount of water consumed by each of his 4 rats (in
ounces) and then puts them each into a maze and records how long it takes each
rat to complete the maze (in seconds). He then calculates the correlation
coefficient between these two variables, which is .48. His data can be found
below:
Water consumed (oz)
Maze Completion Time (sec)
MEAN
S
4
2
7
1
7
8
15
12
3.50
2.65
10.50
3.70
A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed
come out to be .48.
x
water
y
Zx
mazetime
4.00
7.00
2.00
8.00
7.00
15.00
1.00
12.00
Zy
0.19
-0.57
1.32
-0.94
-0.95
-0.68
1.22
0.41
Sumxy=
ZxZy
-0.18
0.38
1.61
-0.38
1.43
B. Write out the equation of the regression line for predicting maze performance from amount of
water consumed.
Z-score Method
QUESTION 3: A researcher in a learning laboratory believes that the amount of
water a rat drinks before entering a maze will affect how well the rat performs in
the maze. He records the amount of water consumed by each ofZxZy
his 4 rats (in
r how long it takes each
ounces) and then puts them each into a maze and records
n 1
rat to complete the maze (in seconds). He then calculates the correlation
coefficient between these two variables, which is .48. His data can be found
below:
Water consumed (oz)
Maze Completion Time (sec)
1.43

MEAN
S
4
2
7
1
7
8
15
12
3.50
2.65
10.50
3.70
r
3
A. As practice, find the correlation coefficient of this data by hand. Confirm
that it does indeed
r  .48
come out to be .48.
x
water
y
Zx
mazetime
4.00
7.00
2.00
8.00
7.00
15.00
1.00
12.00
Zy
0.19
-0.57
1.32
-0.94
-0.95
-0.68
1.22
0.41
Sumxy=
ZxZy
-0.18
0.38
1.61
-0.38
1.43
B. Write out the equation of the regression line for predicting maze performance from amount of
water consumed.
QUESTION 3: A researcher in a learning laboratory believes that the amount of
water a rat drinks before entering a maze will affect how well the rat performs in
the maze. He records the amount of water consumed by each of his 4 rats (in
ounces) and then puts them each into a maze and records how long it takes each
rat to complete the maze (in seconds). He then calculates the correlation
coefficient between these two variables, which is .48. His data can be found
below:
Water consumed (oz)
Maze Completion Time (sec)
MEAN
S
4
2
7
1
7
8
15
12
3.50
2.65
10.50
3.70
A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed
come out to be .48.
x
water
y
Zx
mazetime
4.00
7.00
2.00
8.00
7.00
15.00
1.00
12.00
Zy
0.19
-0.57
1.32
-0.94
-0.95
-0.68
1.22
0.41
Sumxy=
ZxZy
-0.18
0.38
1.61
-0.38
1.43
B. Write out the equation of the regression line for predicting maze performance from amount of
water consumed.
_ _
b = r(sy/sx)
b = .48(3.70/2.65)
b = .67
a = y – bx
a = 10.50 -.67(3.50)
a = 8.16
y = .67x + 8.16
QUESTION 3: A researcher in a learning laboratory believes that the amount of
water a rat drinks before entering a maze will affect how well the rat performs in
the maze. He records the amount of water consumed by each of his 4 rats (in
ounces) and then puts them each into a maze and records how long it takes each
rat to complete the maze (in seconds). He then calculates the correlation
coefficient between these two variables, which is .48. His data can be found
below:
Water consumed (oz)
Maze Completion Time (sec)
MEAN
S
C.
4
2
7
1
7
8
15
12
3.50
2.65
10.50
3.70
Make a prediction of how long it would take a rat that drank 10oz of water to complete this
maze.
y = .67x + 8.16
y = .67(10) + 8.16
y = 14.86 seconds
D.
Write out the equation of the regression line to predict amount of water consumed from
time spent to complete the maze.
b = r(sy/sx)
b = .48(2.65/3.70)
b = .34
_
_
a = y – bx
a = 3.50 -.34(10.50)
a = -.07
y = .34x - .07
QUESTION 3: A researcher in a learning laboratory believes that the amount of
water a rat drinks before entering a maze will affect how well the rat performs in
the maze. He records the amount of water consumed by each of his 4 rats (in
ounces) and then puts them each into a maze and records how long it takes each
rat to complete the maze (in seconds). He then calculates the correlation
coefficient between these two variables, which is .48. His data can be found
below:
Water consumed (oz)
Maze Completion Time (sec)
4
2
7
1
E.
7
8
15
12
What kind of relationship exists between water consumption and maze completion speed?
Is it better for the rats to have consumed a lot of water prior to entering the maze, or does it
hinder their performance?
Since there is a positive moderate correlation between water consumption and maze
completion, it implies that the more water a rat drinks, the longer it takes to
complete the maze. It seems as though it’s better for the rats not to consume a lot of
water so their completion time is quicker.
F.
Calculate the coefficient of determination. What does this value tell you about how well you
are or are not able to make an accurate prediction using this regression line.
r-squared is (.48)(.48) = .2304, which means there is only 23% of completion time
accounted for by water consumed. This is a small amount of variation, telling us
perhaps our prediction is not very accurate.
QUESTION 4: Below is a sample of scores on a new version of an IQ test. The
range of possible points on this test is 0-100.
Name
Maria
John
David
Julia
Marta
Score
78
90
50
65
100
A. Calculate the mean, standard deviation, and variance of these scores. (do this
by hand, show your work)
Mean = Sx/N
Sx = 383
383/5 = 76.6
Raw score method:
Name
Maria
John
David
Julia
Marta
Sums
Sx
Score (x)
78
90
50
65
100
383
 x 
SS   x 
N
2
x^2
6084
8100
2500
4225
10000
30909
Sx2
2
2

383
SS  30909 
5
146689
SS  30909 
5
SS  30909  29337.8
SS  1571.2
s
SS
N 1
s
1571.2
4
s  392.8
s  19.82
s 2  392.8
B. What is the z-score obtained by Julia, and what does this z-score tell us about
her grade?
QUESTION 4: Below is a sample of scores on a new version of an IQ test. The
range of possible points on this test is 0-100.
Name
Maria
John
David
Julia
Marta
Score
78
90
50
65
100
A. Calculate the mean, standard deviation, and variance of these scores. (do this
by hand, show your work)
Mean = Sx/N
Sx = 383
383/5 = 76.6
Deviation Method:
Name
Maria
John
David
Julia
Marta
Score (x) xbar
x-xbar
(x-xbar)^2
78
76.6
1.4
1.96
90
76.6
13.4
179.56
50
76.6
-26.6
707.56
65
76.6
-11.6
134.56
100
76.6
23.4
547.56
Sum
1571.2
_
S(x-x)2 aka SS
s
SS
N 1
s
1571.2
4
s  392.8
s  19.82
s 2  392.8
B. What is the z-score_ obtained by Julia, and what does this z-score tell us about
her grade? z = (x-x)/s
z = (65-76.6)/19.82
z = -.59
Julia’s z-score is negative, indicating she performed worse than average, and
specifically .59 standard deviations below average.
QUESTION 4: Below is a sample of scores on a new version of an IQ test. The
range of possible points on this test is 0-100.
Suppose you want to know if this IQ test is in any way related to the old IQ test, so you
administer a version of the old test to each of these individuals. The following are their scores on
the old IQ test:
Name
Maria
John
David
Julia
Marta
C.
Score
110
130
70
90
160
Is there a relationship between the scores on the old test and the scores on the new test?
In other words, does the new test seem to be measuring IQ in the same way? Describe the
relationship.
x
NEW
Sums
y
OLD
x^2
y^2
xy
78
90
50
65
100
110
130
70
90
160
6084
8100
2500
4225
10000
12100
16900
4900
8100
25600
8580
11700
3500
5850
16000
383
560
30909
67600
45630
Raw Score Method
QUESTION 4: Below is a sample of scores on a new version of an IQ test. The
 x  y 
range of possible points on this test is 0-100.
xy


n
r  to the old IQ test, so
Suppose you want to know if this IQ test is in any way related
you
2
2






x
y  on


2 following
2
 The


administer a version of the old test to each of these individuals.
are
their
scores
x 
y 





n
n 
the old IQ test:



Name
Maria
John
David
Julia
Marta
C.
Score
110
130
70
90
160
(383)(560)
5
r
2

(383) 
(560) 2 
30900 
67600 

5 
5 

45630 
Is there a relationship between the scores on the old test and the scores on the new test?
In other words, does the new test seem to be measuring IQ in the same way? Describe the
45630  42896

relationship.
r
30900  29337.867600  62720
x
NEW
Sums
y
OLD
x^2
78
90
50
65
100
110
130
70
90
160
383
560

y^2
xy
6084
8100
2500
4225
10000
12100
r
16900
4900
8100
25600
30909
67600
2734
8580
1571.211700
4880
3500
r
2734
r
 2769.02

5850
16000
2734
7667456
45630
r  .99
QUESTION 4: Below is a sample of scores on a new version of an IQ test. The
range of possible points on this test is 0-100.
Suppose you want to know if this IQ test is in any way related to the old IQ test, so you
administer a version of the old test to each of these individuals. The following are their scores on
the old IQ test:
Name
Maria
John
David
Julia
Marta
C.
Score
110
130
70
90
160
Is there a relationship between the scores on the old test and the scores on the new test?
In other words, does the new test seem to be measuring IQ in the same way? Describe the
relationship.
Yes, there is a strong positive correlation between the two versions of the
test. The higher the score on the old version, the higher the score on the new
version, thus it seems that the two tests are measuring IQ the same way.
QUESTION 5: Over the years, my students have informed me that they feel as
Thelength.
Mean To assess
though I seem to grade paper assignments according to their
this relationship, I decide to perform a correlational analysis on
number of
x
 the
pages of 12 papers and the grades I assigned to them. I find that
N the correlation
coefficient (r ) is -.90. The following is also known:
Page length
Grade
x
y
Page length Grade
89
971
805
79717
Sx
Sx2
Mean
s
7.42
89
 7.42
12
971
 80.92
12
80.92
Suppose a student had access to this information and wanted to predict their grade for an
upcoming paper. Their paper is 3 pages long.
A. Write out the equation of the regression line to predict grade from paper length.
A.
B.
Predict the grade for this student whose paper is 3 pages long.
If someone received a grade of 100 on their paper, predict the number of pages of their
paper (this will involve multiple steps; ie find the equation of the regression line first,
then plug in to make a prediction).
QUESTION 5: Over the years, my students have informed me that they feel as
though I seem to grade paper assignments accordingStandard
to their length.
To assess
Deviation
(s)
this relationship, I decide to perform a correlational analysis on the number of
SS
pages of 12 papers and the grades I assigned to them.
I find
that the
s  correlation
We know
that…
N 1
coefficient (r ) is -.90. The following is also known:
x
y
 x 
Page length Grade
SS   x 
and
2
2
Sx
Sx2
89
805
Mean
s
N
971
79717
7.42
80.92
3.63
10.21
Page length
Grade
89
971
 805  their grade for an
SS  79717 
Suppose a student had access to this information and wanted toSSpredict
12
12
2
2
upcoming paper. Their paper is 3 pages long.
942841
7921
A. Write out the equation of the regression line to predict grade
SS from
805 paper length. SS  79717 
12
12
SS  805 660.08
SS  144.92
s
A.
B.
144.92
11
SS  79717  78570.08
SS  1146.92
s
1146.92
11
 13.17
s  104.27
Predict the grade for this student whose paper is 3 pages slong.
If someone received a grade of 100 on their paper, predict the number of pages of their
s  3.63
s  10.21
paper (this will involve multiple steps; ie find the equation of the regression line first,
then plug in to make a prediction).
QUESTION 5: Over the years, my students have informed me that they feel as
though I seem to grade paper assignments according to their length. To assess
this relationship, I decide to perform a correlational analysis on the number of
pages of 12 papers and the grades I assigned to them. I find that the correlation
coefficient (r ) is -.90. The following is also known:
x
y
Page length Grade
89
971
805
79717
Sx
Sx2
Mean
s
7.42
80.92
3.63
10.21
Suppose a student had access to this information and wanted to predict their grade for an
upcoming paper. Their paper is 3 pages long.
A. Write out the equation of the regression line to predict grade from paper length.
b = r(sy/sx)
b = -.90(10.21/3.63)
b =-2.53
A.
B.
_ _
a = y - bx
a = 80.92 – (-2.53(7.42))
a = 99.69
y = -2.53x + 99.69
Predict the grade for this student whose paper is 3 pages long. y = -2.53(3) + 99.69
If someone received a grade of 100 on their paper, predict the number of pages of their
paper (this will involve multiple steps; ie find the equation of the regression line first,
then plug in to make a prediction).
= 92.1
QUESTION 5: Over the years, my students have informed me that they feel as
though I seem to grade paper assignments according to their length. To assess
this relationship, I decide to perform a correlational analysis on the number of
pages of 12 papers and the grades I assigned to them. I find that the correlation
coefficient (r ) is -.90. The following is also known:
y
x
Page length Grade
89
971
805
79717
Sx
Sx2
Mean
s
7.42
80.92
3.63
10.21
C. If someone received a grade of 100 on their paper, predict the number of pages of their
paper (this will involve multiple steps; ie find the equation of the regression line first,
then plug in to make a prediction).
b = r(sy/sx)
b = -.90(3.63/10.21)
b = -.32
_ _
a = y - bx
a = 7.42 – (-.32(80.92))
a = 33.30
y = -.32x + 33.30
y = -.32(100) + 33.30 = 1.31 pages
QUESTION 6: You are collecting IQ data from a sample of 20 of your classmates.
You record the following IQ scores:
IQ = {120, 110, 120, 100, 120, 130, 100, 110, 130, 120, 80, 140, 110, 90, 70, 120, 120, 110, 130, 140}
A. Describe the shape of the distribution of IQ scores.
The distribution is negatively skewed and
unimodal.
A. Find the Mean, Median, and Mode. Use these values to support your
judgment of the distribution’s shape in part A.
The mean of this distribution is 113.5, median and mode are both 120. The
fact that the mean is smaller than the median supports the conclusion that
the distribution is negatively skewed.