Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Biostat 510 Homework 4 Key Due Thursday, February 11, 2010 This homework uses the SAS dataset, Allgroups.sas7bdat, that you created for homework 1. You can use your own version of the data, or you can download the dataset from my web page at: http://www.umich.edu/~kwelch/510/2010/homework_2010.html. Remember to submit a libname statement so you can use this dataset in SAS. libname b510 "C:\Documents and Settings\kwelch\Desktop\b510"; options formchar="|----|+|---+=|-/\<>*"; /*Question 1: Create format*/ proc format; value ranfmt 1="Ran" 0="Didn't Run"; run; /*Question 2: Descriptives for Ran and didn't run*/ title "Descriptives for those who ran and didn't run"; proc means data=b510.allgroups; class ran; format ran ranfmt.; run; /*Question 3: Side-by-side boxplots for HR1 and HR2*/ title "Boxplots for those who ran and didn't run"; proc sgplot data=b510.allgroups; vbox hr1/category=ran; format ran ranfmt.; run; title "Boxplots for those who ran and didn't run"; proc sgplot data=b510.allgroups; vbox hr2/category=ran; format ran ranfmt.; run; /*Question 4: Independent samples t-test*/ title "Independent Samples t-test"; proc ttest data=b510.allgroups; class ran; var hr1 hr2; format ran ranfmt.; run; /*Question 5: Paired t-test for all students*/ title "Paired samples t-test"; proc ttest data=b510.allgroups; paired hr2*hr1;run; /*Question 6: Paired t-test BY RAN*/ proc sort data=b510.allgroups; by ran; run; title "Paired t-test by RAN"; 1 proc ttest data=b510.allgroups; paired hr2*hr1; by ran; format ran ranfmt.; run; /*Question 7: Histogram of HR1 using Proc SGPLOT*/ title "Histogram of HR1 for all students"; proc sgplot data=b510.allgroups; histogram hr1; run; /*Question 8: One-sample t-test for mu HR1=72*/ title "Histogram of HR1 for all students"; proc ttest data=b510.allgroups h0=72; var hr1; run; 1. Generate a user-defined format for the variable RAN. This will be used in later questions. The students can set up the format name any way they want, and they can label the values as they wish, as long as it works. 2. Get descriptive statistics for all numeric variables in your allgroups dataset for those who ran and those who did not run. Use the format for RAN for this question. a) What are the mean and standard deviation for HR1 for those who did not run? For those who ran? Mean of HR1 for those who didn't run = 75.11, SD=11.87. Mean of HR1 for those who did run = 71.98, SD=10.68 . b) What are the mean and standard deviation for HR2 for those who did not run? For those who ran? Mean of HR2 for those who didn't run = 71.89, SD=14.13. Mean of HR2 for those who did run = 93.67, SD=17.65 . Include the descriptive statistics output from this question in your homework write-up. Descriptives for those who ran and didn't run The MEANS Procedure N Ran Obs Variable Label N Mean Std Dev Minimum Maximum ---------------------------------------------------------------------------------------------------Didn't Run 44 group group 44 3.7727273 1.5974345 1.0000000 6.0000000 ID ID 44 8.5681818 5.1732664 0 20.0000000 AgeYR AgeYR 44 25.6818182 4.3389674 20.0000000 43.0000000 AgeMO AgeMO 44 172.8522727 158.9036047 0 448.0000000 HR1 HR1 44 75.1136364 11.8718269 49.0000000 121.0000000 HR2 HR2 44 71.8863636 14.1276834 27.0000000 119.0000000 Ran 45 group group 45 3.7777778 1.5795409 1.0000000 6.0000000 ID ID 45 8.0444444 5.1254958 1.0000000 19.0000000 AgeYR AgeYR 45 24.3555556 3.1417681 21.0000000 41.0000000 AgeMO AgeMO 45 168.1777778 151.2902522 0 496.0000000 HR1 HR1 45 71.9777778 10.6823748 56.0000000 107.0000000 HR2 HR2 45 93.6666667 17.6493626 58.0000000 143.0000000 ---------------------------------------------------------------------------------------------------- 3. Create a side-by-side boxplot of HR1 for those who ran and those who didn't run. Create another sideby-side boxplot of HR2 for those who ran and didn't run. Answer the questions below based on the boxplots. Use the format for RAN for this question. a) Compare the the distribution of HR1 for those who ran and didn't run in terms of location (the median), variability, outliers and skewness. b) Compare the the distribution of HR2 for those who ran and didn't run in terms of location (the median), variability, outliers and skewness. Include the boxplots for HR1 and HR2 in your homework write-up. 2 The boxplots for HR1 look very similar for those who ran and those who didn't run. The spread is about the same. There are a few outliers in both groups. For HR2, those who ran have in general a higher mean and median, and there is greater spread or variability for those who ran than for those who didn't. 4. Carry out independent samples t-tests to compare the mean of HR1 and HR2 for those who ran vs. those who did not run. Use the format for RAN for this question. a) Write out the null and alternative hypotheses for HR1 and HR2. Same null and alternative hypothesis for both HR1 and HR2. H0: µRan = µDidn't Run HA: µRan ≠ µDidn't Run Be sure they use a two-sided alternative hypothesis. b) Which t-test (equal or unequal variances) is appropriate for HR1 and for HR2 and why? It is OK to use the equal variances t-test for both HR1 and HR2, because the F-test for equality of variances is not significant, so we don't reject H0 that the variances are equal (using α = .10 to be more conservative for this test). c) How many cases are included in the t-test for HR1? For HR2? N=89 for both tests. d) What do you conclude for HR1, for HR2? There is no significant difference in the mean heart rate at time 1 for those who ran vs. those who didn't run t(87)=1.31, p=0.1936 . There is a significant difference in the mean Heartrate at time2 for those who ran vs. those who didn't run t(87)=6.42, p<.001, with those who ran having a higher mean heart rate at time two than those who didn't run. I don't care so much about the format they use to write out their t-test results, just that they include the test statistic, degrees of freedom, and p-value. 3 Independent Samples t-test The TTEST Procedure Variable: Ran Didn't Run Ran Diff (1-2) Ran Didn't Run Ran Diff (1-2) Diff (1-2) N 44 45 Mean 75.1136 71.9778 3.1359 Method Method Pooled Satterthwaite Method Folded F Ran Didn't Run Ran Diff (1-2) Diff (1-2) N 44 45 Variances Equal Unequal DF 87 85.602 Pooled Satterthwaite Method Pooled Satterthwaite Method Folded F t Value 1.31 1.31 Equality of Variances Num DF Den DF F Value 43 44 1.24 Mean 71.8864 93.6667 -21.7803 Method Std Err 1.7897 1.5924 2.3928 95% CL Mean 71.5043 78.7230 68.7684 75.1871 -1.6200 7.8917 -1.6268 7.8985 Variable: Ran Didn't Run Ran Diff (1-2) (HR1) Std Dev 11.8718 10.6824 11.2859 Mean 75.1136 71.9778 3.1359 3.1359 Pooled Satterthwaite HR1 HR2 Variances Equal Unequal Std Dev 11.8718 10.6824 11.2859 Maximum 121.0 107.0 95% CL Std Dev 9.8088 15.0419 8.8434 13.4940 9.8294 13.2532 Pr > |t| 0.1935 0.1940 Pr > F 0.4883 (HR2) Std Dev 14.1277 17.6494 16.0059 Mean 71.8864 93.6667 -21.7803 -21.7803 Minimum 49.0000 56.0000 Std Err 2.1298 2.6310 3.3935 95% CL Mean 67.5912 76.1816 88.3642 98.9691 -28.5252 -15.0354 -28.5121 -15.0485 DF 87 83.758 t Value -6.42 -6.43 Equality of Variances Num DF Den DF F Value 44 43 1.56 Minimum 27.0000 58.0000 Std Dev 14.1277 17.6494 16.0059 Maximum 119.0 143.0 95% CL Std Dev 11.6726 17.9001 14.6111 22.2947 13.9403 18.7958 Pr > |t| <.0001 <.0001 Pr > F 0.1468 Be sure to write out your conclusions carefully. Don't just give the p-value of the test, and say whether you reject the null hypothesis or not. Instead, describe your results clearly in words, stating which group had a higher mean, and also give the value of the t-test statistic, the degrees of freedom, and the p-value. See the end of the homework for an example. Include the results of the independent sample t-tests for HR1 and HR2 in your homework write-up. 5. Carry out a paired t-test to compare the mean of HR1 and HR2 for all students. a) Write out the null and alternative hypotheses. H0: µHR2 - µHR1 = 0 HA: µHR2 - µHR1 ≠ 0 4 b) Set up your SAS code so the paired t-test is giving you HR2 minus HR1. What is the mean difference between HR2 and HR1 for all students? The sample mean difference is 9.37 beats per minute, with HR2 having the higher mean than HR1 for all students overall. c) How many students are included in this t-test? There are 89 students in this test. d) What do you conclude? Again, be sure to write out your conclusion in words and also give the test statistic, degrees of freedom, and p-value. There is a significant difference between the mean of HR2 and HR1 for all students, t(88)=5.26, p < 0.001, with the mean of HR2 being higher than the mean of HR1. Include the output from the paired t-test in your homework write-up. N 89 Mean 9.3708 Mean 9.3708 Paired samples t-test The TTEST Procedure Difference: HR2 - HR1 Std Dev Std Err Minimum 16.8121 1.7821 -37.0000 95% CL Mean 5.8293 12.9123 DF 88 Std Dev 16.8121 t Value 5.26 Maximum 58.0000 95% CL Std Dev 14.6531 19.7230 Pr > |t| <.0001 6. Carry out a paired t-test to compare the mean of HR1 and HR2 separately for students who ran and those who did not run. You can do this using a By statement. Use your format for RAN for this question. a) What is the sample size, mean difference, and standard deviation of the difference between HR1 and HR2 for those who did not run? Which variable had a higher mean, HR1 or HR2 for those who did not run? There were 44 students who didn't run. The mean difference between HR2 and HR1 was -3.23 beats per minute, which means the mean of HR1 was higher than the mean of HR2 for those who didn't run, t(43) = -3.24, p=0.0023. This was rather unexpected. I expected there to be no change in HR at time1 and at time 2 for those who didn't run. Paired t-test by RAN ------------------------------------------- Ran=Didn't Run ------------------------------------------The TTEST Procedure Difference: HR2 - HR1 N 44 Mean -3.2273 Mean -3.2273 Std Dev 6.6117 Std Err 0.9968 95% CL Mean -5.2374 -1.2171 DF 43 Minimum -37.0000 Std Dev 6.6117 t Value -3.24 Maximum 4.0000 95% CL Std Dev 5.4627 8.3772 Pr > |t| 0.0023 What do you conclude about the difference in the mean of HR1 and HR2 for those who did not run? There was a slight, but significant increase in the mean heart rate from Time1 to Time1, for those who didn't run. b) What is the sample size, mean difference, and standard deviation of the difference between HR1 and HR2 for those who did run? Which variable had a higher mean, HR1 or HR2 for those who ran? The mean difference in the heartrate for those who ran was 21.69 beats per minute. This was a significant increase, t(44) = 10.07, p<0.0001. Paired t-test by RAN ---------------------------------------------- Ran=Ran ----------------------------------------------The TTEST Procedure Difference: HR2 - HR1 5 N 45 Mean 21.6889 Mean 21.6889 Std Dev 14.4455 Std Err 2.1534 95% CL Mean 17.3490 26.0288 DF 44 Minimum -19.0000 Std Dev 14.4455 Maximum 58.0000 95% CL Std Dev 11.9588 18.2476 t Value Pr > |t| 10.07 <.0001 c) What do you conclude about the difference in the mean of HR1 and HR2 for those who did run? For those who did run, there was a substantial increase in their heartrate from time 1 to time 2, and the increase was significant. Include the paired t-test results from SAS for those who ran and did not run in your homework write-up. 7. Get a histogram of HR1 for all students using Proc Sgplot. Include the histogram of HR1 in your homework write-up. 8. Carry out a one-sample t-test to decide if the mean of resting heartrate (HR1) for all students is equal to 72 beats per minute. a) What is the mean of HR1 for all students? The mean of HR1 for all students was 73.53 beats per minute. b) What do you conclude? There is not sufficient evidence, based on this sample, to reject the null hypothesis that the mean of heartrate at time 1 was equal to 72 beats per minute, t(88)=1.27, p=0.2067. . Include the output from the one-sample t-test in your homework write-up. The TTEST Procedure Variable: N 89 Mean 73.5281 Mean 73.5281 Std Dev 11.3319 HR1 Std Err 1.2012 95% CL Mean 71.1410 75.9152 DF (HR1) t Value Minimum 49.0000 Std Dev 11.3319 Pr > |t| 6 Maximum 121.0 95% CL Std Dev 9.8767 13.2939 88 1.27 0.2067 9. Save your SAS commands as homework4.sas. Re-run all of the commands in this command file and make sure there are no errors in your log. NB: Please write out your answers in complete sentences; be sure it is easy to see what questions you are answering. When you are including a statistical test, describe the test results in words. Don't just say whether the result is significant or not. Also, include the test statistic (e.g., t-statistic), degrees of freedom, and p-value for the test. Below is an example of how you might write up a result. This example was taken from " Reporting Statistics in APA Style" by Jeffrey Kahn, Illinois State University http://my.ilstu.edu/~jhkahn/apastats.html " There was a significant effect for gender, t(54) = 5.43, p < .001, with men receiving higher scores than women." Professor Kahn recommends reporting the p-value as p<.05, or p<.01, etc., while I would like you to report the actual p-value, as displayed in SAS. You will be graded on the SAS commands, the output, and your write-up. Be sure to include your SAS commands as the first part of the homework. The appearance of the output will be part of your homework grade. Be sure you submit the SAS command file, fonts.sas, at the start of your homework assignment so you will have nice-looking tables and other output. OPTIONS FORMCHAR="|----|+|---+=|-/\<>*"; Try to keep your SAS output results to a minimum length by judiciously editing it! 7