Download homework4_2010_key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Biostat 510
Homework 4
Key
Due Thursday, February 11, 2010
This homework uses the SAS dataset, Allgroups.sas7bdat, that you created for homework 1. You can use your
own version of the data, or you can download the dataset from my web page at:
http://www.umich.edu/~kwelch/510/2010/homework_2010.html.
Remember to submit a libname statement so you can use this dataset in SAS.
libname b510 "C:\Documents and Settings\kwelch\Desktop\b510";
options formchar="|----|+|---+=|-/\<>*";
/*Question 1: Create format*/
proc format;
value ranfmt 1="Ran"
0="Didn't Run";
run;
/*Question 2: Descriptives for Ran and didn't run*/
title "Descriptives for those who ran and didn't run";
proc means data=b510.allgroups;
class ran;
format ran ranfmt.;
run;
/*Question 3: Side-by-side boxplots for HR1 and HR2*/
title "Boxplots for those who ran and didn't run";
proc sgplot data=b510.allgroups;
vbox hr1/category=ran;
format ran ranfmt.;
run;
title "Boxplots for those who ran and didn't run";
proc sgplot data=b510.allgroups;
vbox hr2/category=ran;
format ran ranfmt.;
run;
/*Question 4: Independent samples t-test*/
title "Independent Samples t-test";
proc ttest data=b510.allgroups;
class ran;
var hr1 hr2;
format ran ranfmt.;
run;
/*Question 5: Paired t-test for all students*/
title "Paired samples t-test";
proc ttest data=b510.allgroups;
paired hr2*hr1;run;
/*Question 6: Paired t-test BY RAN*/
proc sort data=b510.allgroups;
by ran;
run;
title "Paired t-test by RAN";
1
proc ttest data=b510.allgroups;
paired hr2*hr1;
by ran;
format ran ranfmt.;
run;
/*Question 7: Histogram of HR1 using Proc SGPLOT*/
title "Histogram of HR1 for all students";
proc sgplot data=b510.allgroups;
histogram hr1;
run;
/*Question 8: One-sample t-test for mu HR1=72*/
title "Histogram of HR1 for all students";
proc ttest data=b510.allgroups h0=72;
var hr1;
run;
1. Generate a user-defined format for the variable RAN. This will be used in later questions.
The students can set up the format name any way they want, and they can label the values as they wish,
as long as it works.
2. Get descriptive statistics for all numeric variables in your allgroups dataset for those who ran and those
who did not run. Use the format for RAN for this question.
a) What are the mean and standard deviation for HR1 for those who did not run? For those who
ran? Mean of HR1 for those who didn't run = 75.11, SD=11.87.
Mean of HR1 for those who did run = 71.98, SD=10.68 .
b) What are the mean and standard deviation for HR2 for those who did not run? For those who
ran? Mean of HR2 for those who didn't run = 71.89, SD=14.13.
Mean of HR2 for those who did run = 93.67, SD=17.65 .
 Include the descriptive statistics output from this question in your homework write-up.
Descriptives for those who ran and didn't run
The MEANS Procedure
N
Ran
Obs
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
---------------------------------------------------------------------------------------------------Didn't Run
44
group
group
44
3.7727273
1.5974345
1.0000000
6.0000000
ID
ID
44
8.5681818
5.1732664
0
20.0000000
AgeYR
AgeYR
44
25.6818182
4.3389674
20.0000000
43.0000000
AgeMO
AgeMO
44
172.8522727
158.9036047
0
448.0000000
HR1
HR1
44
75.1136364
11.8718269
49.0000000
121.0000000
HR2
HR2
44
71.8863636
14.1276834
27.0000000
119.0000000
Ran
45
group
group
45
3.7777778
1.5795409
1.0000000
6.0000000
ID
ID
45
8.0444444
5.1254958
1.0000000
19.0000000
AgeYR
AgeYR
45
24.3555556
3.1417681
21.0000000
41.0000000
AgeMO
AgeMO
45
168.1777778
151.2902522
0
496.0000000
HR1
HR1
45
71.9777778
10.6823748
56.0000000
107.0000000
HR2
HR2
45
93.6666667
17.6493626
58.0000000
143.0000000
----------------------------------------------------------------------------------------------------
3. Create a side-by-side boxplot of HR1 for those who ran and those who didn't run. Create another sideby-side boxplot of HR2 for those who ran and didn't run. Answer the questions below based on the
boxplots. Use the format for RAN for this question.
a) Compare the the distribution of HR1 for those who ran and didn't run in terms of location (the
median), variability, outliers and skewness.
b) Compare the the distribution of HR2 for those who ran and didn't run in terms of location (the
median), variability, outliers and skewness.
 Include the boxplots for HR1 and HR2 in your homework write-up.
2
The boxplots for HR1 look very similar for those who ran and those who didn't run. The spread is about the
same. There are a few outliers in both groups. For HR2, those who ran have in general a higher mean and
median, and there is greater spread or variability for those who ran than for those who didn't.
4. Carry out independent samples t-tests to compare the mean of HR1 and HR2 for those who ran vs.
those who did not run. Use the format for RAN for this question.
a) Write out the null and alternative hypotheses for HR1 and HR2.
Same null and alternative hypothesis for both HR1 and HR2.
H0: µRan = µDidn't Run
HA: µRan ≠ µDidn't Run
Be sure they use a two-sided alternative hypothesis.
b) Which t-test (equal or unequal variances) is appropriate for HR1 and for HR2 and why?
It is OK to use the equal variances t-test for both HR1 and HR2, because the F-test for equality
of variances is not significant, so we don't reject H0 that the variances are equal (using α = .10 to
be more conservative for this test).
c) How many cases are included in the t-test for HR1? For HR2? N=89 for both tests.
d) What do you conclude for HR1, for HR2?
 There is no significant difference in the mean heart rate at time 1 for those who ran vs.
those who didn't run t(87)=1.31, p=0.1936 .
 There is a significant difference in the mean Heartrate at time2 for those who ran vs.
those who didn't run t(87)=6.42, p<.001, with those who ran having a higher mean heart
rate at time two than those who didn't run.
I don't care so much about the format they use to write out their t-test results, just that they include the test
statistic, degrees of freedom, and p-value.
3
Independent Samples t-test
The TTEST Procedure
Variable:
Ran
Didn't Run
Ran
Diff (1-2)
Ran
Didn't Run
Ran
Diff (1-2)
Diff (1-2)
N
44
45
Mean
75.1136
71.9778
3.1359
Method
Method
Pooled
Satterthwaite
Method
Folded F
Ran
Didn't Run
Ran
Diff (1-2)
Diff (1-2)
N
44
45
Variances
Equal
Unequal
DF
87
85.602
Pooled
Satterthwaite
Method
Pooled
Satterthwaite
Method
Folded F
t Value
1.31
1.31
Equality of Variances
Num DF
Den DF
F Value
43
44
1.24
Mean
71.8864
93.6667
-21.7803
Method
Std Err
1.7897
1.5924
2.3928
95% CL Mean
71.5043 78.7230
68.7684 75.1871
-1.6200
7.8917
-1.6268
7.8985
Variable:
Ran
Didn't Run
Ran
Diff (1-2)
(HR1)
Std Dev
11.8718
10.6824
11.2859
Mean
75.1136
71.9778
3.1359
3.1359
Pooled
Satterthwaite
HR1
HR2
Variances
Equal
Unequal
Std Dev
11.8718
10.6824
11.2859
Maximum
121.0
107.0
95% CL Std Dev
9.8088 15.0419
8.8434 13.4940
9.8294 13.2532
Pr > |t|
0.1935
0.1940
Pr > F
0.4883
(HR2)
Std Dev
14.1277
17.6494
16.0059
Mean
71.8864
93.6667
-21.7803
-21.7803
Minimum
49.0000
56.0000
Std Err
2.1298
2.6310
3.3935
95% CL Mean
67.5912 76.1816
88.3642 98.9691
-28.5252 -15.0354
-28.5121 -15.0485
DF
87
83.758
t Value
-6.42
-6.43
Equality of Variances
Num DF
Den DF
F Value
44
43
1.56
Minimum
27.0000
58.0000
Std Dev
14.1277
17.6494
16.0059
Maximum
119.0
143.0
95% CL Std Dev
11.6726 17.9001
14.6111 22.2947
13.9403 18.7958
Pr > |t|
<.0001
<.0001
Pr > F
0.1468
Be sure to write out your conclusions carefully. Don't just give the p-value of the test, and say whether you
reject the null hypothesis or not. Instead, describe your results clearly in words, stating which group had a
higher mean, and also give the value of the t-test statistic, the degrees of freedom, and the p-value. See the end
of the homework for an example.
 Include the results of the independent sample t-tests for HR1 and HR2 in your homework write-up.
5. Carry out a paired t-test to compare the mean of HR1 and HR2 for all students.
a) Write out the null and alternative hypotheses.
H0: µHR2 - µHR1 = 0
HA: µHR2 - µHR1 ≠ 0
4
b) Set up your SAS code so the paired t-test is giving you HR2 minus HR1. What is the mean
difference between HR2 and HR1 for all students? The sample mean difference is 9.37 beats per
minute, with HR2 having the higher mean than HR1 for all students overall.
c) How many students are included in this t-test? There are 89 students in this test.
d) What do you conclude? Again, be sure to write out your conclusion in words and also give the
test statistic, degrees of freedom, and p-value. There is a significant difference between the mean
of HR2 and HR1 for all students, t(88)=5.26, p < 0.001, with the mean of HR2 being higher than
the mean of HR1.
 Include the output from the paired t-test in your homework write-up.
N
89
Mean
9.3708
Mean
9.3708
Paired samples t-test
The TTEST Procedure
Difference: HR2 - HR1
Std Dev
Std Err
Minimum
16.8121
1.7821
-37.0000
95% CL Mean
5.8293 12.9123
DF
88
Std Dev
16.8121
t Value
5.26
Maximum
58.0000
95% CL Std Dev
14.6531 19.7230
Pr > |t|
<.0001
6. Carry out a paired t-test to compare the mean of HR1 and HR2 separately for students who ran and
those who did not run. You can do this using a By statement. Use your format for RAN for this question.
a) What is the sample size, mean difference, and standard deviation of the difference between HR1
and HR2 for those who did not run? Which variable had a higher mean, HR1 or HR2 for those
who did not run?
There were 44 students who didn't run. The mean difference between HR2 and HR1 was -3.23
beats per minute, which means the mean of HR1 was higher than the mean of HR2 for those who
didn't run, t(43) = -3.24, p=0.0023. This was rather unexpected. I expected there to be no change
in HR at time1 and at time 2 for those who didn't run.
Paired t-test by RAN
------------------------------------------- Ran=Didn't Run ------------------------------------------The TTEST Procedure
Difference: HR2 - HR1
N
44
Mean
-3.2273
Mean
-3.2273
Std Dev
6.6117
Std Err
0.9968
95% CL Mean
-5.2374 -1.2171
DF
43
Minimum
-37.0000
Std Dev
6.6117
t Value
-3.24
Maximum
4.0000
95% CL Std Dev
5.4627
8.3772
Pr > |t|
0.0023
What do you conclude about the difference in the mean of HR1 and HR2 for those who did not run?
There was a slight, but significant increase in the mean heart rate from Time1 to Time1, for those who didn't
run.
b) What is the sample size, mean difference, and standard deviation of the difference between HR1
and HR2 for those who did run? Which variable had a higher mean, HR1 or HR2 for those who
ran?
The mean difference in the heartrate for those who ran was 21.69 beats per minute. This was a
significant increase, t(44) = 10.07, p<0.0001.
Paired t-test by RAN
---------------------------------------------- Ran=Ran ----------------------------------------------The TTEST Procedure
Difference:
HR2 - HR1
5
N
45
Mean
21.6889
Mean
21.6889
Std Dev
14.4455
Std Err
2.1534
95% CL Mean
17.3490 26.0288
DF
44
Minimum
-19.0000
Std Dev
14.4455
Maximum
58.0000
95% CL Std Dev
11.9588 18.2476
t Value
Pr > |t|
10.07
<.0001
c) What do you conclude about the difference in the mean of HR1 and HR2 for those who did run?
For those who did run, there was a substantial increase in their heartrate from time 1 to time 2, and the
increase was significant.
 Include the paired t-test results from SAS for those who ran and did not run in your homework write-up.
7. Get a histogram of HR1 for all students using Proc Sgplot.
 Include the histogram of HR1 in your homework write-up.
8. Carry out a one-sample t-test to decide if the mean of resting heartrate (HR1) for all students is equal to
72 beats per minute.
a) What is the mean of HR1 for all students? The mean of HR1 for all students was 73.53 beats per
minute.
b) What do you conclude? There is not sufficient evidence, based on this sample, to reject the null
hypothesis that the mean of heartrate at time 1 was equal to 72 beats per minute, t(88)=1.27,
p=0.2067.
.
 Include the output from the one-sample t-test in your homework write-up.
The TTEST Procedure
Variable:
N
89
Mean
73.5281
Mean
73.5281
Std Dev
11.3319
HR1
Std Err
1.2012
95% CL Mean
71.1410 75.9152
DF
(HR1)
t Value
Minimum
49.0000
Std Dev
11.3319
Pr > |t|
6
Maximum
121.0
95% CL Std Dev
9.8767 13.2939
88
1.27
0.2067
9. Save your SAS commands as homework4.sas. Re-run all of the commands in this command file and
make sure there are no errors in your log.
NB: Please write out your answers in complete sentences; be sure it is easy to see what questions you are
answering. When you are including a statistical test, describe the test results in words. Don't just say
whether the result is significant or not. Also, include the test statistic (e.g., t-statistic), degrees of freedom,
and p-value for the test.
Below is an example of how you might write up a result. This example was taken from " Reporting Statistics in
APA Style" by Jeffrey Kahn, Illinois State University http://my.ilstu.edu/~jhkahn/apastats.html
" There was a significant effect for gender, t(54) = 5.43, p < .001, with men receiving higher scores than
women."
Professor Kahn recommends reporting the p-value as p<.05, or p<.01, etc., while I would like you to report the
actual p-value, as displayed in SAS.
You will be graded on the SAS commands, the output, and your write-up.
 Be sure to include your SAS commands as the first part of the homework.
 The appearance of the output will be part of your homework grade.
 Be sure you submit the SAS command file, fonts.sas, at the start of your homework assignment so you
will have nice-looking tables and other output.
OPTIONS FORMCHAR="|----|+|---+=|-/\<>*";

Try to keep your SAS output results to a minimum length by judiciously editing it!
7