Download Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
数据挖掘
DATA MINING
Shanghai China, July 5, 2015
Yawei Zhang, MD, PhD, MPH
Associate Professor
Yale University School of Public Health
Data Mining
 Data mining is a process of finding anomalies, patterns,
and correlations within large data sets to predict
outcomes
 More information does not mean more knowledge
 Data mining allows us to sift through all the chaotic and
repetitive noise, understand what is relevant and then
make good use of that information to assess likely
outcome
Data Bases
 Registry Databases
Tumor registry
Birth registry
Mortality registry
 Health Insurance Databases
 Medical Records
 Research Survey Databases
 Individual epidemiologic study databases
Data Mining
 Knowledge discovery in databases
 Foundation
 Statistics: numeric study of data relationships
 Artificial intelligence: human-like intelligence displayed by
software and/or machines
Gansu Provincial Maternity
and Child Care Hospital
 Machine learning: algorithms that can learn from data and make
predictions
Lanzhou Birth Cohort Study
Eligible Study Population (N=14,359)
Come to the hospital for delivery in 2010-2012
Ages 18 years or older
Gestational age ≥20 weeks
No mental illness
Participants (N=10,542)
3,712 refused to participate
105 did not complete in-person interview
Questionnaire
Biosamples
Medical Records
Air Pollutants
Demographic and lifestyle
Residential history
Medical and reproductive
Diet and supplements
Maternal blood
Cord blood
Birth outcomes
PM10
SO2
NO2
Temperature
Humidity
(Birthweight, gestational age,
birth length, head circumference, defects,
Preterm Birth, SGA, LGA, low BW)
Maternal complications
(Gestational hypertension, preeclampsia,
gestational diabetes, thyroid diseases)
Folic acid supplementation and dietary folate
intake and risk of preeclampsia
(Wang et al. Eur J Clin Nutr 2015 PMID: 25626412)
 Folic acid supplements reduce blood homocysteine levels, which
is elevated among women with gestational hypertension and
preeclampsia
 Epidemiologic studies reached inconsistent results
 Three studies found reduced risk associated with folic acid containing
multivitamins
 Three studies reported no association with folic acid supplements alone
 One study reported reduced risk associated with dietary folate intake
Study Population
 Sample size 10,041
– Excluding women with chronic hypertension and
gestational hypertension
– Excluding women who give birth defects
 Preeclampsia:
Exposure Assessment
 Folic Acid Supplements
 Users: those who took folic acid supplements alone or
folic acid-containing multivitamins before conception
and/or during pregnancy
 Nonusers: those who never took folic acid supplements
alone or folic acid-containing multivitamins before
conception and/or during pregnancy
 Dietary folate
 Estimated from the frequency of consumption and
portion size of food items using the Chinese Standard
Tables of Food Consumption
Folic acid supplementation and dietary folate
intake and risk of preterm birth in China
(Liu et al. Eur J Nutr 2015 (in press))
 Folate plays an essential role in DNA synthesis, repair, and
methylation
 Seven randomized controlled trials linking maternal folic acid
supplementation to PB have reported inconsistent findings.
 Epidemiologic studies examining folic acid supplementation and
dietary folate and PB have also reported mixed results
 1 positive, 10 negative, and 3 null findings.
Study Population
 10,179 women having singleton live birth
 Preterm Birth (<37 completed gestational weeks, N=1,019)
– Moderate PB (32 to <37 completed weeks of gestation)
– Very PB ( <32 completed weeks)
– Medically indicated PB
 When a placental, uterine, fetal, or maternal condition exists prompting the
medical team to proceed with delivery after the risks and benefits of
continuing pregnancy versus early delivery are weighed.
 Examples of risky conditions prompting a decision include: placental
abruption, placenta accreta, placenta or vasa previa, prior classical cesarean,
uterine rupture or dehiscence, fetal intrauterine growth restriction, select fetal
anomalies, severe preeclampsia, uncontrolled gestational or chronic
hypertension, complicated pregestational diabetes and oligohydramnios.
– Spontaneous PB
 With or without PB premature rupture of membranes (PPROM).
Table 3. Associations between folic acid supplementation and risk of preterm birth
Moderate preterm
(32 to <37 weeks)
Preterm (<37 weeks)
Folic acid supplement use
Controls
Cases
ORa 95% CI
Cases
ORa 95% CI
Very preterm (<32 weeks)
Cases
ORa,c 95% CI
Medically indicated preterm
Cases
ORb,c 95% CI
Spontaneous preterm
Cases
ORb 95% CI
Non-Users
1982
333
1.00
252
1.00
81
1.00
120
1.00
213
1.00
Users
7178
686
0.80 0.68, 0.94
580
0.92 0.77, 1.09
106
0.50 0.36, 0.69
218
0.82 0.63, 1.05
468
0.77 0.64, 0.93
≤12 weeks
4405
481
0.85 0.72, 1.01
411
0.99 0.82, 1.18
70
0.50 0.35, 0.71
153
0.85 0.65, 1.11
328
0.82 0.68, 1.00
>12 weeks
2773
205
0.67 0.55, 0.83
169
0.74 0.59, 0.94
36
0.49 0.31, 0.77
65
0.73 0.52, 1.03
140
0.64 0.51, 0.82
P for trend
Preconception &
during pregnancy
0.01
0.004
0.91
0.30
0.03
2734
217
0.75 0.61, 0.92
183
0.85 0.68, 1.07
34
0.47 0.30, 0.75
66
0.79 0.56, 1.12
151
0.73 0.57, 0.93
≤12 weeks
569
59
0.88 0.64, 1.21
52
1.06 0.75, 1.49
7
0.40 0.18, 0.91
16
0.78 0.44, 1.36
43
0.93 0.65, 1.34
>12 weeks
2165
158
0.71 0.57, 0.89
131
0.79 0.61, 1.01
27
0.50 0.30, 0.81
50
0.80 0.55, 1.16
108
0.67 0.52, 0.87
P for trend
0.21
0.098
0.71
1.00
0.10
Preconception only
339
35
0.88 0.60, 1.31
33
1.12 0.75, 1.68
2
0.21 0.59, 0.88
8
0.72 0.38, 1.38
27
0.98 0.63, 1.51
≤4 weeks
89
12
1.01 0.52, 1.95
10
1.17 0.58, 2.37
2
0.61 0.14, 2.69
3
0.88 0.30, 2.60
9
1.18 0.57, 2.45
>4 weeks
250
23
0.83 0.52, 1.33
23
1.10 0.69, 1.76
0
-
5
0.66 0.30, 1.45
18
0.91 0.54, 1.52
P for trend
During pregnancy
only
0.80
0.99
0.73
0.56
0.79
4105
434
0.82 0.69, 0.97
364
0.93 0.77, 1.12
70
0.53 0.37, 0.75
144
0.84 0.64, 1.09
290
0.77 0.63, 0.94
≤8 weeks
1871
246
0.94 0.77, 1.13
206
1.07 0.87, 1.32
40
0.59 0.39, 0.88
80
0.93 0.69, 1.27
166
0.91 0.72, 1.13
>8 weeks
2234
188
0.70 0.57, 0.85
158
0.79 0.63, 0.99
30
0.46 0.29, 0.72
64
0.74 0.53, 1.02
124
0.64 0.50, 0.81
P for trend
0.005
0.007
0.42
0.16
0.003
for maternal age, education level, smoking, parity, preeclampsia, maternal diabetes, preeclampsia, pre-pregnancy BMI, family monthly income per capita, maternal employment during pregnancy, history
of preterm, and dietary folate intake.
b Adjusted all variables above except for preeclampsia and maternal diabetes.
c Estimated by using Fisher’s exact test for the number of cases in a category<5.
a Adjusted
Table 4. Associations between estimated dietary folate intake and risk of preterm birth
Dietary folate duration
& intake levels (µg
/day)
Preconception
Q1 <118.6
Preterm (<37 weeks)
Controls
Cases
ORa 95% CI
Moderate preterm
(32 to <37 weeks)
Very preterm (<32
weeks)
Medically indicated preterm
Spontaneous preterm
Cases
ORa 95% CI
Cases
ORa 95% CI
Cases
ORb 95% CI
Cases
ORb 95% CI
2248
313
1.00
252
1.00
61
1.00
109
1.00
204
1.00
Q2 118.6-161.8
2236
246
0.90 0.75, 1.08
197
0.91 0.74, 1.11
49
0.91 0.62, 1.35
73
0.99 0.76, 1.29
173
0.95 0.76, 1.17
Q3 161.8-224.6
2241
242
0.84 0.70, 1.01
196
0.85 0.69, 1.04
46
0.82 0.55, 1.21
79
0.76 0.57, 1.01
163
0.87 0.70, 1.08
Q4 ≥224.6
2245
196
0.68 0.56, 0.83
172
0.76 0.61, 0.94
24
0.44 0.27, 0.71
67
0.60 0.44, 0.81
129
0.69 0.54, 0.87
P for trend
Per 10 µg increase
<.001
0.009
0.001
<.001
0.002
0.996 0.990,1.001
0.998
0.993,1.003
0.975
0.958,0.992
0.991 0.981,1.000
0.993 0.986,1.000
During pregnancy
Q1 <155.8
2245
373
1.00
285
1.00
88
1.00
124
1.00
249
1.00
Q2 155.8-202.8
2239
228
0.70 0.59, 0.84
182
0.74 0.60, 0.90
46
0.62 0.43, 0.90
72
0.53 0.40, 0.70
156
0.69 0.56, 0.85
Q3 202.8-272.1
2245
212
0.67 0.55, 0.80
187
0.78 0.64, 0.95
25
0.33 0.21, 0.52
72
0.50 0.38, 0.67
140
0.63 0.51, 0.79
Q4 ≥272.1
2241
184
0.57 0.47, 0.70
163
0.67 0.54, 0.83
21
0.28 0.17, 0.47
60
0.47 0.34, 0.63
124
0.57 0.45, 0.71
P for trend
Per 10 µg increase
<.001
<.001
<.001
<.001
<.001
0.998 0.982,0.995
0.994
0.988,1.000
0.949
0.931,0.968
0.979 0.969,0.990
0.985 0.977,0.993
a Adjusted
for maternal age, education level, smoking, parity, preeclampsia, maternal diabetes, preeclampsia, pre-pregnancy BMI, family monthly income per capita, maternal
employment during pregnancy, history of preterm, folic acid supplementation.
b Adjusted
all variables above except for preeclampsia and maternal diabetes.
Passive Smoking and Preterm Birth in Urban China
(Qiu et al. Am J Epidemiol 2014; 180(1): 94-102 PMID: 24838804)
 Smoking is a risk factor for preterm birth
 Role of passive smoking in preterm birth is unclear
 Epidemiologic studies examining passive smoking and pretrm
birth reported mixed results
 7 positive, and 7 no association.
Study Population and Exposure
Assessment
 10,094 women having singleton live birth and non-smokers
 Preterm Birth (<37 completed gestational weeks, N=1,009)
–
–
–
–
Moderate PB (32 to <37 completed weeks of gestation)
Very PB ( <32 completed weeks)
Medically indicated PB
Spontaneous PB
 With or without PB premature rupture of membranes (PPROM).
 Passive smokers
– women who exposed to cigarette smoke at home, at work, during
social and recreational activities, and/or while commuting to and
from work for at least 30 minutes per week during pregnancy
Maternal exposure to environmental tobacco
smoke and risk of small for gestational age among
non-smoking Chinese women
(Huang et al. Paediatr Perinat Epidemiol 2015 (in press))
 Smoking is a risk factor for SGA
 Role of passive smoking in SGA is unclear
 Epidemiologic studies examining passive smoking and SGA
birth reported mixed results
 11 positive, and 7 no association.
Study Population and Exposure
Assessment
 Small for gestational age (SGA): an infant born with a birth
weight below the 10th percentile of the gestational age- and
gender-specific birth weight standards for Chinese newborns
(N=775)
 Appropriate for gestational age (AGA): neonates who
weighed between the 10th and 90th percentiles (N=7,863)
 Large for gestational age (LGA): an infant born with a birth
weight above the 90th percentile using the same standards
(N=1,413)
Table 3. Associations between ETS exposure and small for gestational age by exposure timing, duration, and location.
Small for gestational age
Appropriate for
gestational age
N (%)
OR* (95% CI)
No ETS exposure
6,392
586 (8.4)
1.00
Ever exposed to ETS during pregnancy
1,471
189 (11.4)
1.29 (1.09-1.54)
Ever exposed to ETS during the 1st trimester
1,380
171 (7.8)
1.24 (1.03-1.49)
Ever exposed to ETS during the 2nd trimester
1,254
167 (11.8)
1.33 (1.11-1.61)
Ever exposed to ETS during the 3rd trimester
1,107
151 (12.0)
1.36 (1.12-1.65)
1,075
396
132 (10.9)
57 (12.6)
1.23 (1.01-1.51)
1.46 (1.09-1.96)
0.002
920
460
107 (10.4)
64 (12.2)
1.15 (0.92-1.43)
1.43 (1.08-1.89)
0.008
834
420
102 (10.9)
65 (13.4)
1.21 (0.97-1.52)
1.57 (1.19-2.08)
0.001
740
367
94 (11.3)
57 (13.4)
1.25 (0.99-1.58)
1.57 (1.17-2.11)
<0.001
1,098
354
156 (12.4)
32 (8.3)
1.36 (1.12-1.65)
1.10 (0.75-1.60)
1,022
336
139 (12.0)
32 (8.7)
1.29 (1.06-1.58)
1.15 (0.79-1.67)
925
309
138 (13.0)
28 (8.3)
1.42 (1.16-1.74)
1.09 (0.73-1.62)
841
256
124 (12.8)
26 (9.2)
1.39 (1.12-1.72)
1.23 (0.81-1.87)
Duration of ETS exposure (hours/day)
Ever exposed during pregnancy
<1
≥1
P for trend**
Ever exposed during the 1st trimester
<1
≥1
P for trend**
Ever exposed during the 2nd trimester
<1
≥1
P for trend**
Ever exposed during the 3rd trimester
<1
≥1
P for trend**
Location of ETS exposure
Ever exposed during pregnancy
Home
Other locations
Ever exposed during the 1st trimester
Home
Other locations
Ever exposed during the 2nd trimester
Home
Other locations
Ever exposed during the 3rd trimester
Home
Other locations
*Adjusted for maternal age (continuous), education, employment, parity, maternal pre-pregnancy BMI, gestational hypertension, history of delivery low birth weight infant, and total
energy intake during pregnancy. **P for trends was estimated as duration a continuous variable.
Table 4. Associations between ETS exposure and small for gestational age by trimester.
Appropriate for
gestational age
Small for gestational age
N (%)
OR* (95% CI)
No ETS exposure
6,392
586 (8.4)
1.00
Exposed to ETS throughout entire pregnancy
1,030
133 (11.4)
1.27 (1.03-1.55)
The 1st and 2nd trimesters
157
18 (7.8)
1.23 (0.74-2.02)
The 1st and 3rd trimesters
10
2 (11.8)
2.37 (0.51-11.07)
The 2nd and 3rd trimesters
43
14 (24.6)
3.79 (2.04-7.02)
The 1st trimester
183
18 (9.0)
1.03 (0.63-1.70)
The 2nd trimester
24
2 (7.7)
0.86 (0.20-3.68)
The 3rd trimester
24
2 (7.7)
0.85 (0.20-3.65)
Exposed to ETS in any two trimesters
Exposed to ETS exclusively in one trimester
*Adjusted for maternal age (continuous), education, employment, parity, maternal pre-pregnancy BMI, gestational
hypertension, history of delivery low birth weight infant, and total energy intake during pregnancy.
Ambient PM10 Exposure and Preterm Birth
Nan et al., Environ Int 2015; 76: 71-7 PMID: 25553395
 Twelve earlier studies (two in China) provided inconsistent
results.
 Majority were based on registry database including 2 in
China
 All studies (except 2 in China) were conducted in areas with
low air pollution levels (mean PM10 ranges from 13µg/m3 to
90µg/m3)
 Very few studies examine the associations with preterm
subtypes
Locations of monitors, distribution of residences of births and buffers of
6, 12, and 50km from monitors (n=8969).
WHO guideline of PM10 : 20 μg/m3
Earlier studies: mean PM10 ranges from 13μg/m3 to 90μg/m3
Of 8969 singleton live births, 677 (7.5%) were preterm and 8292 were term births. Among
preterm births, moderate and very preterm birth were 571 (84.3%) and 103 (15.7)
respectively. Medically indicated preterm births (n=185) accounted for 27.3% of preterm
births while spontaneous preterm birth (n=492) accounted for 72.7% of all cases.
U.S. National Ambient Air Quality Standard (NAAQS) (150µg/m3, equivalent to the China NAAQS Grade II level)
Ambient air pollution and congenital heart
defects in Lanzhou, China
Jan et al., Environ Res Letter 2015 (in press)
Outcome groups
Subtypes of outcome groups
Number of cases
Congenital malformations of great
arteries (Q25)
Patent ductus arteriosus
52
Both Patent ductus arteriosus and
Stenosis of pulmonary artery
2
Isolated cases of Ventricular septal
defect
8
Isolated cases of Artrial septal defect
10
Both Ventricular septal defect
1
Congenital malformations of cardiac
septa (Q21)
Other congenital malformations of
heart (Q24)
7
Congenital malformations of cardiac
chambers and connections (Q20)
1
Exposure to cooking fuels and birth weight
in Lanzhou, China: a birth cohort study
Jiang et al., BMC Public Health 2015 (in press)
• Exposure to household air pollution resulting from cooking fuels has
also been suggested as an important cause of low birth weight in
developing countries
• Several studies reported that exposure to biomass smoke was
associated with an increased risk of low birth weight
• However, none of these studies have controlled for gestational age
• It is unclear whether biomass smoke was associated with prematurity
or intrauterine growth restriction.
Table 3. Multiple linear regression model for mean birth weight of cooking fuel types
Fuel type
N
Mean±SD(g)
Difference from gas*(g)
95%CI
Gas
7907
3310.66±499.16
0.00
Coal
358
2970.40±709.54
-73.31
-119.77 to -26.86
Biomass
120
2804.96±803.89
-87.84
-164.46 to -10.76
Electromagnetic
487
3150.22±613.10
-30.20
-69.02 to 8.63
*Adjusted for maternal age, education, and family income, and maternal weight gain, vitamin supplement during
pregnancy, preeclampsia, caesarean section, parity, gestational week, smoking, and ventilation.
Table 4. Associations between type of fuel and risk of LBW
ORᵃ(95%CI)
ORᵇ(95%CI)
Fuel type
NBW
LBW
Gas
6965
371
1.00
1.00
Coal
270
70
1.92(1.37-2.69)
1.09(0.67-1.78)
Biomass
73
42
3.74(2.35-5.94)
2.51(1.26-5.01)
Electromagnetic
408
53
1.48(1.05-2.06)
1.14(0.71-1.83)
a
Adjusted for maternal age, education, family income, maternal weight gain, vitamin supplement
during pregnancy, preeclampsia, caesarean section, parity, smoking, and ventilation.
b Additional
adjustment for gestational week.
Table 5. Associations between type of fuel and risk of LBW by preterm and term births
Fuel types
Term
Gas
Coal
Biomass
Electromagnetic
Preterm
Gas
Coal
Biomass
Electromagnetic
Moderate Preterm
Gas
Coal
Biomass
Electromagnetic
Very Preterm
Gas
Coal
Biomass
NBW
LBW
ORᵃ(95%CI)
ORᵇ(95%CI)
6668
239
67
382
102
10
7
9
1.00
1.00(0.48-2.09)
1.87(0.76-4.62)
0.91(0.44-1.89)
1.00
0.96(0.46-2.03)
1.85(0.72-4.71)
0.84(0.40-1.76)
297
31
6
26
269
60
35
44
1.00
1.53(0.88-2.64)
5.24(2.03-13.53)
1.47(0.84-2.58)
1.00
1.26(0.67-2.37)
3.43(1.21-9.74)
1.38(0.72-2.65)
292
30
6
25
205
41
24
35
1.00
1.34(0.73-2.43)
4.32(1.61-11.58)
1.58(0.87-2.87)
1.00
1.25(0.64-2.41)
3.19(1.09-9.39)
1.48(0.75-2.93)
5
64
1.00
1.00
1
19
0.86(0.06-12.80)
0.85(0.06-12.69)
0
11
—
—
1
9
0.41(0.03-5.52)
0.42(0.03-5.83)
Electromagnetic
a Adjusted for maternal age, education, family income, maternal weight gain, vitamin supplement during pregnancy,
preeclampsia, caesarean section, parity, smoking, and ventilation.
b Additional adjustment for gestational week.