Download Framingham Heart Study Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Erya Huang
Diana Vargas
Assignment Data Summary:
1. Cupples, L Adrienne, Qiong Yang, Serkalem Demissie, et al. “Description of the
Framingham Heart Study data for Genetic Analysis Workshop I3.” BMC Genetics 2003,
4(Suppl I):S2.
2. The goal of the study is to identify common factors or characteristics that contribute to
cardiovascular disease (CVD) by following its development over along period of time in
a large group of participants who had not yet developed overt symptoms of CVD or
suffered a heart attack or stroke. The study was motivated by the fact that cardiovascular
disease is the leading cause of death in the United States and at the time little was known
about the causes of heart disease. It continues today with more than 50 years of follow
up on the original cohort. The study has found that “high blood pressure, high blood
cholesterol, low HDL cholesterol, smoking, obesity, diabetes,” and lack of exercise are
the major CVD factors.
3. Members of the original cohort returned every 2 years for a “detailed medical history,
physical examination, and laboratory tests.” The offspring cohort was observed in 4 year
increments, and underwent the same exams.
4. This was an observational study, where Framingham, MA was chosen because it had a
relatively stable population, and was thought to be representative of most U.S. towns at
the time. The Original Cohort of the Framingham Heart Study consisted of 5,209
respondents of a random sample of 2/3 of the adult population of Framingham,
Massachusetts, 30 to 62 years of age by household, in 1948. The Offspring Study was
initiated in 1971. A sample of 5,124 men and women, consisting of the offspring of the
Original Cohort and their spouses was recruited.
5. People who had already developed obvious symptoms of CVD or had suffered a heart
attack or stroke were not included in the study.
6. Variables
sex= gender coded as
1=if subject is male;
2= if subject I female;
sbp= systolic blood pressure (SBP) in mm Hg;
dbp= diastolic blood pressure (DBP) in mm Hg;
scl= serum cholesterol (SCL) in mg/100ml;
chdfate=
1= if the patient develops CHD at the end of follow-up;
0= otherwise;
followup= the subject’s follow-up in days;
age= age in years;
bmi= body mass index (BMI) =weight/height^2 in kg/m^2;
month= month of year in which baseline exam occurred;
id= a patient identification variable (numbered 1 to 4699).
7. It is surprising that this study continues to this day, with more than 50 years of
follow-up.
Part 2:
1.
The average age is 46.04 and the standard deviation is 8.50 years. There are 2049 men
and 2650 women. The average follow-up in days is 8061.313 with a standard deviation
of 3595.3 days. 1473(45.6%) of the patients developed CHD and 3226 did not.
The mean sbp is 132.77 in mm Hg with a standard deviation of 22.80 in mm Hg. In
particular, the mean sbp for males is 132.12 in mm Hg with a standard deviation of 19.73
in mm Hg; the mean sbp for females is 133.27 in mm Hg with a standard deviation of
24.91 in mm Hg.
The mean dbp is 82.54 in mm Hg with a standard deviation of 12.74 in mm Hg. In
particular, the mean dbp for males is 83.47 in mm Hg with a standard deviation of 12.11
in mm Hg; the mean dbp for females is 81.82 in mm Hg with a standard deviation of
13.16 in mm Hg.
2.
50
500
40
400
30
300
20
200
100
0
1
0
1
From left to right:
1. Boxplot of serum cholesterol (SCL) in mg/100ml of residents who develop CHD
at the end of follow-up(1) and others(0);
2. Boxplot of body mass index(BMI) in kg/m^2 of residents who develop CHD at
the end of follow-up(1) and others(0).
From the plot, we can see that patients who develop CHD have higher SCL and BMI than
others, which indicates that high SCL and BMI might be CHD risk factors.
3.
A logistic regression can be used to predict whether one gets CHD, as well as to see what
the odds associated with a one unit change with each variable.
y = -3.338 + 0.00834* scl + 0.0683*bmi – 0.7436*sex
All p-values are <2e-16, indicating that the variables of SCL, BMI and sex might be
important variables in this logistic regression.
** Call:
glm(formula = chdfate ~ scl + bmi + sex, family = "binomial")
Deviance Residuals:
Min
1Q Median
3Q Max
-1.8311 -0.8844 -0.6693 1.2258 2.1489
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.3384435 0.2761265 -12.090 <2e-16 ***
scl
0.0083400 0.0007544 11.054 <2e-16 ***
bmi
0.0682871 0.0080608 8.471 <2e-16 ***
sex
-0.7436432 0.0659622 -11.274 <2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5800.7 on 4657 degrees of freedom
Residual deviance: 5446.7 on 4654 degrees of freedom
(41 observations deleted due to missingness)
AIC: 5454.7
Number of Fisher Scoring iterations: 4
**
Given the associated p-values, we fail to reject the null that SCL, BMI, and sex do not
affect CHD.
95% CI for β(SCL)
0.0083400+ 0.0007544*1.96=0.009818624
0.0083400- 0.0007544*1.96= 0.006861376
We are 95% confident that the odds ratio for CHD associated for a 1 unit increase in SCL
is between 1.009867 and 1.006885.
95% CI for β(BMI)
0.0682871+0.0080608*1.96=0.08408627
0.0682871-0.0080608*1.96=0.05248793
We are 95% confident that the odds ratio for CHD associated for a 1 unit increase in BMI
is between 1.087723 and 1.053890.
95% CI for β(SEX)
-0.7436432+ 0.0659622*1.96= -0.6143573
-0.7436432- 0.0659622*1.96=-0.8729291
We are 95% confident that the odds ratio of CHD for women is between 0.5409885 and
2.393913. This interval includes one, so we are not 95% confident that men and women
have different odds of CHD.