Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Erya Huang Diana Vargas Assignment Data Summary: 1. Cupples, L Adrienne, Qiong Yang, Serkalem Demissie, et al. “Description of the Framingham Heart Study data for Genetic Analysis Workshop I3.” BMC Genetics 2003, 4(Suppl I):S2. 2. The goal of the study is to identify common factors or characteristics that contribute to cardiovascular disease (CVD) by following its development over along period of time in a large group of participants who had not yet developed overt symptoms of CVD or suffered a heart attack or stroke. The study was motivated by the fact that cardiovascular disease is the leading cause of death in the United States and at the time little was known about the causes of heart disease. It continues today with more than 50 years of follow up on the original cohort. The study has found that “high blood pressure, high blood cholesterol, low HDL cholesterol, smoking, obesity, diabetes,” and lack of exercise are the major CVD factors. 3. Members of the original cohort returned every 2 years for a “detailed medical history, physical examination, and laboratory tests.” The offspring cohort was observed in 4 year increments, and underwent the same exams. 4. This was an observational study, where Framingham, MA was chosen because it had a relatively stable population, and was thought to be representative of most U.S. towns at the time. The Original Cohort of the Framingham Heart Study consisted of 5,209 respondents of a random sample of 2/3 of the adult population of Framingham, Massachusetts, 30 to 62 years of age by household, in 1948. The Offspring Study was initiated in 1971. A sample of 5,124 men and women, consisting of the offspring of the Original Cohort and their spouses was recruited. 5. People who had already developed obvious symptoms of CVD or had suffered a heart attack or stroke were not included in the study. 6. Variables sex= gender coded as 1=if subject is male; 2= if subject I female; sbp= systolic blood pressure (SBP) in mm Hg; dbp= diastolic blood pressure (DBP) in mm Hg; scl= serum cholesterol (SCL) in mg/100ml; chdfate= 1= if the patient develops CHD at the end of follow-up; 0= otherwise; followup= the subject’s follow-up in days; age= age in years; bmi= body mass index (BMI) =weight/height^2 in kg/m^2; month= month of year in which baseline exam occurred; id= a patient identification variable (numbered 1 to 4699). 7. It is surprising that this study continues to this day, with more than 50 years of follow-up. Part 2: 1. The average age is 46.04 and the standard deviation is 8.50 years. There are 2049 men and 2650 women. The average follow-up in days is 8061.313 with a standard deviation of 3595.3 days. 1473(45.6%) of the patients developed CHD and 3226 did not. The mean sbp is 132.77 in mm Hg with a standard deviation of 22.80 in mm Hg. In particular, the mean sbp for males is 132.12 in mm Hg with a standard deviation of 19.73 in mm Hg; the mean sbp for females is 133.27 in mm Hg with a standard deviation of 24.91 in mm Hg. The mean dbp is 82.54 in mm Hg with a standard deviation of 12.74 in mm Hg. In particular, the mean dbp for males is 83.47 in mm Hg with a standard deviation of 12.11 in mm Hg; the mean dbp for females is 81.82 in mm Hg with a standard deviation of 13.16 in mm Hg. 2. 50 500 40 400 30 300 20 200 100 0 1 0 1 From left to right: 1. Boxplot of serum cholesterol (SCL) in mg/100ml of residents who develop CHD at the end of follow-up(1) and others(0); 2. Boxplot of body mass index(BMI) in kg/m^2 of residents who develop CHD at the end of follow-up(1) and others(0). From the plot, we can see that patients who develop CHD have higher SCL and BMI than others, which indicates that high SCL and BMI might be CHD risk factors. 3. A logistic regression can be used to predict whether one gets CHD, as well as to see what the odds associated with a one unit change with each variable. y = -3.338 + 0.00834* scl + 0.0683*bmi – 0.7436*sex All p-values are <2e-16, indicating that the variables of SCL, BMI and sex might be important variables in this logistic regression. ** Call: glm(formula = chdfate ~ scl + bmi + sex, family = "binomial") Deviance Residuals: Min 1Q Median 3Q Max -1.8311 -0.8844 -0.6693 1.2258 2.1489 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.3384435 0.2761265 -12.090 <2e-16 *** scl 0.0083400 0.0007544 11.054 <2e-16 *** bmi 0.0682871 0.0080608 8.471 <2e-16 *** sex -0.7436432 0.0659622 -11.274 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5800.7 on 4657 degrees of freedom Residual deviance: 5446.7 on 4654 degrees of freedom (41 observations deleted due to missingness) AIC: 5454.7 Number of Fisher Scoring iterations: 4 ** Given the associated p-values, we fail to reject the null that SCL, BMI, and sex do not affect CHD. 95% CI for β(SCL) 0.0083400+ 0.0007544*1.96=0.009818624 0.0083400- 0.0007544*1.96= 0.006861376 We are 95% confident that the odds ratio for CHD associated for a 1 unit increase in SCL is between 1.009867 and 1.006885. 95% CI for β(BMI) 0.0682871+0.0080608*1.96=0.08408627 0.0682871-0.0080608*1.96=0.05248793 We are 95% confident that the odds ratio for CHD associated for a 1 unit increase in BMI is between 1.087723 and 1.053890. 95% CI for β(SEX) -0.7436432+ 0.0659622*1.96= -0.6143573 -0.7436432- 0.0659622*1.96=-0.8729291 We are 95% confident that the odds ratio of CHD for women is between 0.5409885 and 2.393913. This interval includes one, so we are not 95% confident that men and women have different odds of CHD.