Download Homework #1:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Prognostics wikipedia , lookup

Intelligent maintenance system wikipedia , lookup

Transcript
Psych 3101
Keller
1
5/6/17
Name: _____________________________________
Homework #3: Z-SCORES
Part 1: Complete the following problems using R:
Please report all R code you use so Andrew can see where your errors came from. Be careful
when pasting into R from Microsoft Word. Sometimes individual characters are changed (like
the “ and $ characters).
1) Conscientiousness was measured on all students in this class attending the first lab. It is defined
as, “A tendency to show self discipline and aim for achievement; to do planned rather than
spontaneous behavior.” We might expect that this behavior differs in some way between people
who have high GPAs vs. those with lower GPAs. Let’s begin by importing the survey data from
the website to R:
load(url("http://www.matthewckeller.com/Stats3101/Stat3101.Datasets.RDa
ta"))
A) Next, take a quick look at just the columns of interest from the survey dataset:
data <- survey[,c(“GPA_major”,”conscientiousness”)]
data
summary(data)
hist(data$GPA_major)
hist(data$conscientiousness)
mean(data$GPA_major) #note: this command doesn’t work.
Can you read R’s help function (?mean) to figure out why not? Hint: there are “NA’s” or
missing values in your data, and you need to use an argument inside the function in order
to tell R to ignore those values.
B) In words, describe the central tendencies, spreads, and shapes of the distributions of
these two variables. Then, in parentheses, place relevant statistics (if possible) to back up
what you just said in words.
C) Find the z-score of conscientiousness for each person in the dataset. You will have to
create a new variable for this. Name it something like zconsc (as a shorthand way of
saying the “z score of the conscientousness measure).
D) Now, attach this new variable (zconsc or whatever you named it) to your dataset, data.
To attach a new variable to your dataset, do this:
data$zconsc <- zconsc (or whatever you named it)
E) Break up the data into those with above-median GPAs vs. below median GPAs. For
example, to get all the people who have high GPAs, do this:
Psych 3101
Keller
2
5/6/17
high.gpa <- data[data$GPA_major >= median(data$GPA_major,na.rm=TRUE),]
Write out the code for students with a low GPA below:
F) Does conscientiousness differ between those with high GPA’s vs. low GPA’s?
Compare the means (of the raw scores) of the two datasets you just created. Then, in
order to make this raw difference more interpretable, standardize it by dividing it by the
standard deviation of the whole sample. Put into words what this new, transformed
difference means. This is called a “Cohen’s d”.
Cohen’s d 
X1  X 2
s
HINT: To get the conscientousness scores for students in the two groups, you might need
to refer to the technique you used on last week’s homework where we looked at how
often UPPERclassmen versus LOWerclassmen checked their email.
G) Find the mean of the z-scores of conscientiousness for the high GPA group and for the
low GPA group. How does the difference between the mean z-scores compare to the
Cohen’s d above?
H) Because it is very unlikely that the means of two samples will be exactly the same,
almost every time we look for a mean difference on scores between two samples, we will
find one. That’s not interesting! What we’d really like to know is whether the difference
seen in our samples suggest that the populations themselves differ. In other words, we’d
like to know if the difference we see “just due to chance,” caused by the particular
samples we drew (technically, called “sampling error”), or is it reflective of a true
difference in the population. In this case, the population would be all possible CU
Psychology undergraduate students, now and in the future, with low vs. high GPAs.
What is your opinion on whether the mean difference between high vs. low GPA groups
is “real” (reflective of a true population difference) or “just due to chance”? What factors
(e.g., sample sizes, different variances than those observed, bigger mean differences than
those observed, etc.) would make you more confident of your opinion?
I) Can we conclude that there is a causal relationship between conscientiousness and
grades –i.e., that conscientiousness leads to higher grades or that higher grades lead to
higher conscientiousness? Why or why not?
Part 2: Complete by hand:
Psych 3101
Keller
3
5/6/17
1)
A) Describe in words your grandmother could understand what the standard deviation is.
B) Describe in words your grandmother could understand what a z-score of -1.5 means.
2) The following questions refer to these scores on a test given to my honors class:
65 89 92 94 70 75 83 82 90 78 73 88 94 92
Do the problems below by HAND, and show your work:
A) By hand, figure the mean, standard deviation, and z-scores for each of the scores above.
B) Draw a density histogram of these data (such that the area under the curve = 1)? The xaxis should be the original scores.
C) Under the original scores on the x-axis, write the z-scores for each bin. In other words,
transform the x-axis to a “standardized scale,” and put this new standardized scale under
the original.
3) For a sample mean of M=15 and standard deviation s=10, find the X value corresponding to the
following 3 z-values: 0, -2, 0.5.