Download Mining Event or State Sequences: A Social Science

Document related concepts
no text concepts found
Transcript
Mining Event or State Sequences
Mining Event or State Sequences:
A Social Science Perspective
Gilbert Ritschard
Department of Econometrics, University of Geneva
http://mephisto.unige.ch
IIS 2008, Zakopane, Poland, June 16-18
13/7/2008gr 1/86
Mining Event or State Sequences
My talk is about life courses,
Example of scientific life course
to help you understand what a social scientist does at IIS
date
1970-1979
1980-1992
1985-...
1990-1995
2000-...
2003-...
2005-...
13/7/2008gr 2/86
event
Studies in econometrics
Mathematical Economics
Work with Social scientists (Family studies)
Interest in Statistics for social sciences
Interest in Neural Networks
KDD and data mining (Clustering, supervised learning)
Work with historians, demographers, psychologists
(longitudinal data)
KDD and Data mining approaches
for analysing life course data
Mining Event or State Sequences
Outline
1
Sequence Analysis in Social Sciences
2
Survival Trees
3
Visualizing and clustering sequence data
4
Mining Frequent Episodes
13/7/2008gr 3/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
Motivation
Individual life course paradigm.
Following macro quantities (e.g. #divorces, fertility rate, mean
education level, ...) over time
insufficient for understanding social behavior.
Need to follow individual life courses.
Data availability
Large panel surveys in many countries
(SHP, CHER, SILC, GGP, ...)
Biographical retrospective surveys (FFS, ...).
Statistical matching of censuses, population registers and other
administrative data.
13/7/2008gr 6/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
Motivation
Need for suited methods for discovering interesting knowledge
from these individual longitudinal data.
Social scientists use
Essentially Survival analysis (Event History Analysis)
More rarely sequential data analysis (Optimal Matching,
Markov Chain Models)
Could social scientists benefit from data-mining approaches?
Which methods?
Are there specific issues with those methods for social
scientists?
13/7/2008gr 7/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
Motivation: KD in Social sciences
In KDD and data mining, focus on prediction and
classification.
Improve prediction and classification errors.
In Social science, aim is understanding/explaining (social)
behaviors.
Hence focus is on process rather than output.
13/7/2008gr 8/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
What kind of data
What kind of data are we dealing with?
Mainly categorical longitudinal data describing life courses
An ontology of longitudinal data (Aristotelean tree).
13/7/2008gr 9/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
Alternative views of Individual Longitudinal Data
Table: Time stamped events, record for Sandra
ending secondary school in 1970 first job in 1971 marriage in 1973
Table: State sequence view, Sandra
year
1969
1970
1971
1972
1973
civil status
single
single
single
single
married
education level primary secondary secondary secondary secondary
job
no
no
first
first
first
13/7/2008gr 10/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
Issues with life course data
Incomplete sequences
Censored and truncated data:
Cases falling out of observation before experiencing an event of
interest.
Sequences of varying length.
Time varying predictors.
Example: When analysing time to divorce, presence of children
is a time varying predictor.
Data collected by clusters
Example: Household panel surveys.
Multi-level analysis to account for unobserved shared
characteristics of members of a same cluster.
13/7/2008gr 11/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Motivation
Multi-level: Simple linear regression example
9
y = 15.6 - 0.8 x
8
y = 12.5 - 0.8 x
7
Children
6
5
4
3
y = 3.2 + 0.2 x
2
y = 6.2 - 0.8 x
1
0
1
3
5
7
9
Education
13/7/2008gr 12/86
11
13
15
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Methods for Longitudinal Data
Classical statistical approaches
Survival Approaches
Survival or Event history analysis (Blossfeld and Rohwer, 2002)
Focuses on one event.
Concerned with duration until event occurs
or with hazard of experiencing event.
Survival curves: Distribution of duration until event occurs
S(t) = p(T ≥ t) .
Hazard models: Regression like models for S(t, x) or hazard
h(t) = p(T = t | T ≥ t)
h(t, x) = g t, β0 + β1 x1 + β2 x2 (t) + · · ·
13/7/2008gr 14/86
.
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Methods for Longitudinal Data
Survival curves
(Switzerland, SHP 2002 biographical survey)
1
0.9
Survival probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Women
0.1
0
0
10
20
30
40
50
60
70
80
AGE (years)
13/7/2008gr 15/86
Leaving home
Last child left
Marriage
Divorce
1st Chilbirth
Widowing
Parents' death
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Methods for Longitudinal Data
Analysis of sequences
Frequencies of given subsequences
Essentially event sequences.
Subsequences considered as categories ⇒ Methods for
categorical data apply (Frequencies, cross tables, log-linear
models, logistic regression, ...).
Markov chain models
State sequences.
Focuses on transition rates between states.
Does the rate also depend on previous states?
How many previous states are significant?
Optimal Matching (Abbott and Forrest, 1986) .
State sequences.
Edit distance (Levenshtein, 1966; Needleman and Wunsch,
1970) between pairs of sequences.
Clustering of sequences.
13/7/2008gr 16/86
Mining Event or State Sequences
Sequence Analysis in Social Sciences
Methods for Longitudinal Data
Typology of methods for life course data
Issues
Questions
duration/hazard
descriptive • Survival curves:
Parametric
(Weibull, Gompertz, ...)
and non parametric
(Kaplan-Meier, NelsonAalen) estimators.
causality
13/7/2008gr 17/86
• Hazard regression models
(Cox, ...)
• Survival trees
state/event sequencing
• Optimal matching
clustering
• Frequencies of given
patterns
• Discovering typical
episodes
• Markov models
• Mobility trees
• Association rules
among episodes
Mining Event or State Sequences
Survival Trees
The biographical SHP dataset
SHP biographical retrospective survey
http://www.swisspanel.ch
SHP retrospective survey: 2001 (860) and 2002 (4700 cases).
We consider only data collected in 2002.
Data completed with variables from 2002 wave (language).
Characteristics of retained data for divorce
(individuals who get married at least once)
men
women Total
Total
1414
1656
3070
1st marriage dissolution
231
308
539
16.3% 18.6% 17.6%
13/7/2008gr 20/86
Mining Event or State Sequences
Survival Trees
The biographical SHP dataset
Distribution by birth cohort
300
0
100
200
Frequency
400
500
Birth year
1910
1920
1930
year
13/7/2008gr 21/86
1940
1950
1960
Mining Event or State Sequences
Survival Trees
The biographical SHP dataset
Marriage duration until divorce
1
1
0.95
0.95
0.9
0.9
0.85
0.85
prob. de surv
vie
prob. de surv
vie
Survival curves
08
0.8
0.75
0.7
08
0.8
1942 et avant
1942
1943-19520.75
1943
1953 et après
1953
0.65
0.7
0.65
0.6
0.6
0.55
0.55
0.5
0.5
0
10
20
30
40
0
Durée du mariage, Femmes
1943-1952
1953 et après
13/7/2008gr 22/86
10
20
Durée du mariage, Hommes
1942 et avant
30
40
Mining Event or State Sequences
Survival Trees
The biographical SHP dataset
Marriage duration until divorce
Hazard model
Discrete time model (logistic regression on person-year data)
exp(B) gives the Odds Ratio, i.e. change in the odd h/(1 − h)
when covariate increased by 1 unit.
birthyr
university
child
language
Constant
13/7/2008gr 23/86
unknwn
French
German
Italian
exp(B)
1.0088
1.22
0.73
1.47
1.26
1
0.89
0.0000000004
Sig.
0.002
0.043
0.000
0.000
0.007
ref
0.537
0.000
Mining Event or State Sequences
Survival Trees
Survival Tree Principle
Survival trees: Principle
Target is survival curve or some other survival characteristic.
Aim: Partition data set into groups that
differ as much as possible (max between class variability)
Example: Segal (1988) maximizes difference in KM survival
curves by selecting split with smallest p-value of Tarone-Ware
Chi-square statistics
X wi di1 − E(Di )
TW =
1/2
i
wi2 var(Di )
are as homogeneous as possible (min within class variability)
Example: Leblanc and Crowley (1992) maximize gain in
deviance (-log-likelihood) of relative risk estimates.
13/7/2008gr 25/86
Mining Event or State Sequences
Survival Trees
Example
Divorce, Switzerland, Differences in KM Survival Curves I
R o o t
S < 9 0 %
a t 1 1
S (3 0 ) = 7 7 %
Zoom
n = 3 6 1 9
e = 6 2 2
B ir th C o h o r t
T W (1 ) = 5 4 .8 , p < .0 0 0 1
£ 1 9 4 0
S < 9 0 %
> 1 9 4 0
S < 9 0 %
a t 2 1
n =
e =
n = 2 7 7 8
e = 4 9 9
8 4 1
1 2 3
C h ild
L a n g u a g e
T W (1 ) = 3 7 .4 , p < .0 0 0 1
T W (1 ) = 2 2 .5 , p < .0 0 0 1
N o n F re n c h
S < 9 0 %
S < 9 0 %
S (3 0 ) = 8 9 %
n =
e =
S < 9 0 %
a t 1 1
n =
e =
N o
S < 9 0 %
S (3 0 ) = 9 0 %
n =
e =
a t 1 0
S (3 0 ) = 7 6 %
6 1 6
6 7
13/7/2008gr 27/86
S < 9 0 %
N o n F re n c h
n =
e =
L 1
5 1
1 2
S < 9 0 %
a t 1 3
S (3 0 ) = 7 7 %
L 2
n =
e =
1 4 4 4
2 1 7
T W (1 ) = 4 .4 5 , p = .0 3 4 9
F re n c h , u n k n w
S < 9 0 %
a t 8
S (3 0 ) = 7 0 %
L 4
n =
e =
6 0 3
1 3 8
U n iv e r s ity
T W (1 ) = 9 .7 7 , p = .0 0 1 8
Y e s
a t 2 9
n =
e =
L a n g u a g e
T W (1 ) = 8 .0 8 , p = .0 0 4 5
a t 5
S (3 0 ) = 6 4 %
n = 2 1 7 5
e = 3 6 1
1 7 4
4 4
L 3
U n iv e r s ity
S < 9 0 %
a t 1 1
S (3 0 ) = 7 5 %
S (3 0 ) = 7 4 %
6 6 7
7 9
N o , m is s .
Y e s
F re n c h
a t 2 6
a t 9
S (3 0 ) = 7 3 %
S (3 0 ) = 8 6 %
7 3 1
1 4 4
N o
S < 9 0 %
Y e s
a t 6
S (3 0 ) = 6 5 %
L 5
n =
e =
5 1 7
1 1 5
S < 9 0 %
a t 3
S (3 0 ) = 5 9 %
L 6
n =
e =
8 6
2 3
L 7
Mining Event or State Sequences
Survival Trees
Example
0.6
0.7
0.8
0.9
1.0
Divorce, Switzerland, Differences in KM Survival Curves II
Cohort <=1940 & Non French Speaking & University
Cohort <=1940 & Non French Speaking & < University
Cohort <=1940 & French Speaking
Cohort > 1940 & No Child & University
Cohort > 1940 & No Child & < University
0.5
Cohort > 1940 & Child & German or Italian Speaking
Cohort > 1940 & Child & French or Unknown Speaking
0
13/7/2008gr 28/86
10
20
30
40
Mining Event or State Sequences
Survival Trees
Example
Divorce, Switzerland, Relative risk
R o o t
l = 1
n = 3 6 1 9
e = 6 2 2
B ir th C o h o r t
D D e v = 5 5 .9
£ 1 9 4 0
> 1 9 4 0
l = 1 .2
l = 0 .6
n =
e =
n = 2 7 7 8
e = 4 9 9
8 4 1
1 2 3
C h ild
L a n g u a g e
D D e v = 3 0 .9
D D e v = 1 8 .4
N o n F re n c h
F re n c h
Y e s
N o , m is s .
l = 0 .4 8
l = 1 .1
l = 1 .0 6
l = 1 .8 8
n = 2 1 7 5
e = 3 6 1
n =
n =
e =
13/7/2008gr 29/86
6 6 7
7 9
n =
e =
1 7 4
4 4
e =
6 0 3
1 3 8
Mining Event or State Sequences
Survival Trees
Example
Hazard model with interaction
Adding interaction effects detected with the tree approach
improves significantly the fit (sig ∆χ2 = 0.004)
exp(B)
1.78
1.22
0.94
1.50
1.12
1
0.92
Sig.
0.000
0.049
0.619
0.000
0.282
ref
0.677
b_before_40*French
b_after_40*child
1.46
0.68
0.028
0.010
Constant
0.008
0.000
born after 1940
university
child
language
13/7/2008gr 30/86
unknwn
French
German
Italian
Mining Event or State Sequences
Survival Trees
Social Science Issues
Issues with survival trees in social sciences
1
Dealing with time varying predictors
Segal (1992) discusses few possibilities, none being really
satisfactory.
Huang et al. (1998) propose a piecewise constant approach
suitable for discrete variables and limited number of changes.
Room for development ...
2
Multi-level analysis
How can we account for multi-level effects in survival trees,
and more generally in trees?
Conjecture: Should be possible to include unobserved shared
effect in deviance-based splitting criteria.
13/7/2008gr 32/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Life trajectories
Sequence analysis
Survival approaches not useful in a unitary (holistic)
perspective of the whole life course.
Sequence analysis of whole collection of life events better
suited for such holistic approach (Billari, 2005).
Rendering sequences
Colorize your life courses
Results from the analysis of the retrospective Swiss Household
Panel (SHP) survey.
Focus on visualization of life course data.
13/7/2008gr 35/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Life trajectories
Evolution tendencies in familial life course trajectories
Sequence analysis techniques permit to test hypotheses about
evolution in these familial life trajectories. (Elzinga and Liefbroer,
2007):
De-standardization: Some states and events of familial life are
shared by decreasing proportions of the population, occur at
more dispersed ages and their duration is also more scattered.
De-institutionalization: Social and temporal organization of
life courses becomes less driven by normative, legal or
institutional rules.
Differentiation: Number of distinct steps lived by individual
increases.
13/7/2008gr 36/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Example: the BioFam sequential data set
Presentation of the “BioFam” data
Data from the retrospective survey conducted in 2002 by the
Swiss Household Panel (SHP)
(with support of Federal Statistical Office, Swiss National
Fund for Scientific Research, University of Neuchatel.)
Retrospective survey: 5560 individuals
Retained familial life events: Leaving Home, First childbirth,
First marriage and First divorce.
Age 15 to 45 → 2601 remaining individuals, born between
1909 et 1957.
13/7/2008gr 38/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Example: the BioFam sequential data set
Distribution by birth cohort
300
0
100
200
Frequency
400
500
Birth year
1910
13/7/2008gr 39/86
1920
1930
1940
1950
1960
Mining Event or State Sequences
Visualizing and clustering sequence data
Example: the BioFam sequential data set
Creating state sequences
Example of time stamped data:
individual
1
13/7/2008gr 40/86
LHome
1989
marriage
1990
childbirth
1992
divorce
NA
Mining Event or State Sequences
Visualizing and clustering sequence data
Example: the BioFam sequential data set
Deriving the states
Need one state for each combination of events:
0
1
2
3
4
5
6
7
13/7/2008gr 41/86
LHome
no
yes
no
yes
no
yes
yes
yes/no
marriage
no
no
yes
yes
no
no
yes
yes
childbirth
no
no
yes/no
no
yes
yes
yes
yes/no
divorce
no
no
no
no
no
no
no
yes
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Definition
Entropy: measure of uncertainty regarding sequence
predictability.
pi , proportion of P
cases (or time points) in state i.
Shannon h(p) = i −pi log2 (pi )
Other type of entropies: Quadratic (Gini), Daroczy, ...
Two ways of using entropies.
Entropy of the state at each time (age) point: Entropy
increases with diversity of states observed at each time point
(age).
Entropy of each individual sequences: Entropy increases with
diversity of states during the observed life course and varies
with the time spend in each state.
13/7/2008gr 43/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Entropy of the state at each time (age) point
0.4
0.2
Entropy
0.6
0.8
Entropy of bifam state distribution by age
a15
13/7/2008gr 44/86
a17
a19
a21
a23
Age
a25
a27
a29
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Entropy: Minimum/maximum
Sequences 1−15, sorted by Entropy
Entropie minimum, médiane et maximum
N/N/N/N
Y/N/N/N
N/Y/*/N
Y/Y/N/N
N/N/Y/N
Y/N/Y/N
Y/Y/Y/N
*/*/*/Y
A15
13/7/2008gr 45/86
A20
A25
A30
Time
A35
A40
A45
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Entropy - histogram
300
200
0
100
Frequency
400
500
Entropy for the sequences in the biofam data set
0.0
13/7/2008gr 46/86
0.5
1.0
Entropy
1.5
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Hypothesis
Evolutions of familial life trajectories gives rise to an increase
in the entropy of individual sequences,
because they become less predictable and more diversified.
13/7/2008gr 47/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Entropy by birth cohorts
1.5
Distribution de l'entropie selon les cohortes de naissances
●
●
1.0
0.5
0.0
Sequences entropy
●
13/7/2008gr 48/86
●
●
●
●
●
●
●
1909−18
1919−28
1929−38
1939−48
1949−58
Birth cohort
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Entropy by sex
1.0
0.5
0.0
Sequences entropy
1.5
Distribution de l'entropie selon le sexe
13/7/2008gr 49/86
●
●
Hommes
Femmes
Sexe
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Definition
Turbulence (Elzinga and Liefbroer, 2007): Somewhat similar
to entropy.
Turbulence accounts for state sequencing (which is not the
case of the entropy).
Turbulence accounts of the following two elements:
number of subsequences:
x=S,U,M,MC - 16 subsequences more turbulent than
y=S,U,S,C - 15 subsequences
variance of duration in each state:
S/10 U/2 M/132 is less turbulent than
S/48 U/48 M/48
13/7/2008gr 50/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Turbulence - Minimum/maximum
Sequences 1−15, sorted by Turbulence
Turbulence minimum, médiane et maximum
N/N/N/N
Y/N/N/N
N/Y/*/N
Y/Y/N/N
N/N/Y/N
Y/N/Y/N
Y/Y/Y/N
*/*/*/Y
A15
13/7/2008gr 51/86
A20
A25
A30
Time
A35
A40
A45
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Turbulence - histogram
400
0
200
Frequency
600
Turbulence for the sequences in the biofam data set
2
13/7/2008gr 52/86
4
6
Turbulence
8
10
Mining Event or State Sequences
Visualizing and clustering sequence data
Characteristics of sequences
Turbulence by cohorts
10
Turbulence selon la cohorte de naissances
●
●
●
●
●
8
6
4
●
2
Sequences turbulence
●
●
●
●
●
●
●
●
●
1909−18
13/7/2008gr 53/86
●
●
●
●
1919−28
●
●
●
1929−38
1939−48
1949−58
Birth cohort
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
Clustering, Multidimensional scaling and more
Once you are able to compute 2 by 2 distances between
sequences you can among others:
Cluster sequences
Make scatter plot representation of sets of sequences using
multidimensional scaling.
13/7/2008gr 55/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
Distances between sequences
Edit distance (known as Optimal matching in Social sciences)
(Levenshtein, 1966; Needleman and Wunsch, 1970; Abbott and
Forrest, 1986)
d(x , y ) Total cost of insert, deletion and substitution changes
required to transform sequence x into y .
Different solutions depending on indel and substitution costs.
Other metrics proposed by (Elzinga, 2008)
LCP: Longest common prefix (also longest common postfix)
LCS: Longest common subsequence
(same as OM with indel cost = 1, and substitution cost = 2).
NMS: Number of matching subsequences
...
Elzinga (2008) proposes a nice formalization of these metrics.
13/7/2008gr 56/86
0
200
200
Height
400
Height
400
600
600
800
800
1000
1000
1200
Dendrogram of agnes(x = dist.om1, diss = TRUE, method = "ward")
OM1
dist.om1
Agglomerative Coefficient = 1
13/7/2008gr 57/86
1
173
347
849
1081
1100
1192
1488
1752
1783
2205
2259
2382
2589
121
155
285
563
790
796
929
992
1019
1419
1468
2023
2125
130
55
258
534
231
1332
1859
2151
535
1387
1519
737
2467
2
133
142
184
248
637
653
818
889
912
1193
1243
1254
1615
1678
1993
2163
2261
2555
26
59
104
159
172
428
663
860
1014
1452
1485
1559
1620
1663
2267
2525
2554
2584
1149
37
163
195
234
358
362
598
784
813
965
1020
1032
1042
1059
1065
1088
1249
1252
1343
1795
1825
1892
1899
1925
1964
2002
2258
2358
2535
2546
2597
1116
162
297
536
652
856
1238
1244
1510
1552
1554
1609
1727
1738
1787
1945
2048
2257
2335
2373
2457
2496
2497
15
82
129
131
312
660
677
833
905
913
1089
1138
1239
1329
1378
1512
1584
1680
1874
1884
2343
2448
2552
83
91
112
150
152
260
267
282
299
549
599
764
1053
1379
1753
2075
2145
2547
132
2478
1673
1581
1873
1653
30
87
137
235
256
345
364
403
594
907
1092
1284
1476
1489
1526
2047
2207
2272
2349
2361
2396
2596
1728
31
100
220
243
277
281
354
455
460
483
710
814
850
970
1689
1900
2052
2215
2339
2348
2465
2574
2595
2068
35
102
259
266
309
311
429
729
819
837
999
1187
1250
1264
1760
1768
1806
1886
1920
2001
2067
2325
2340
2359
2556
110
1073
1506
360
2036
918
1373
1978
420
1573
1091
1669
2522
626
1061
1877
1322
1924
2507
1204
2223
3
405
528
2098
2365
2366
2477
642
777
1072
1677
384
711
925
1234
1316
1341
1793
1875
1927
842
1377
1915
2454
2097
29
1385
385
417
641
835
1008
1233
2038
2466
2560
242
810
841
982
1156
1297
1436
1672
1683
1763
1798
1862
1990
2483
189
649
1144
2144
945
981
1703
1656
1895
2157
2243
310
530
2327
788
943
2438
952
2074
224
379
503
940
1314
1645
1076
2437
2370
240
2389
1797
2237
2326
395
473
502
561
880
1463
1812
1813
944
2206
2218
2306
1339
2414
76
1535
1654
146
750
1224
1171
1491
1694
1211
1406
316
1143
2394
648
1885
716
1443
2487
2371
759
803
1599
1125
1679
1691
51
2445
2513
1474
2213
344
580
597
1842
177
1133
2279
2053
607
578
1018
827
994
1773
353
579
627
1016
476
1774
2369
61
143
465
472
593
646
755
792
876
1003
1384
1671
1907
2120
2245
2269
2503
169
253
322
877
1004
1056
1071
1397
1713
2128
2220
2281
2482
2568
996
1870
1279
1866
168
2236
442
1356
1755
1937
92
333
464
817
1082
1182
1216
1350
1690
1712
1767
2393
537
413
414
468
763
896
1150
1155
1401
2110
2155
2241
2226
736
1652
1338
1490
401
963
1221
1362
2081
2082
1848
1849
86
559
2381
1442
441
1936
960
2143
2016
1242
830
1976
1398
1982
1313
139
323
348
1386
1692
1112
1355
1423
1467
1957
2372
787
1740
662
1333
2005
1764
1843
283
284
816
924
1154
2055
2127
2476
1361
802
1126
1289
1702
1754
1803
1894
1950
2193
2421
908
2510
42
60
586
2006
487
910
2026
1051
1562
488
585
1829
881
81
887
1017
2105
568
701
241
493
911
665
1952
596
872
2032
196
821
319
1317
238
914
300
1684
365
707
492
2142
2593
74
289
75
2316
635
1400
1633
2190
640
879
1879
1949
111
602
601
1667
1722
1208
1433
979
1977
167
888
499
584
2531
873
1326
197
793
555
1666
324
937
2071
1209
1435
1830
513
1697
1943
2590
794
2521
780
1147
1085
1958
10
38
99
113
164
171
187
212
213
226
228
229
302
304
386
404
427
432
454
484
494
521
523
715
760
767
773
774
775
781
797
857
980
991
1011
1023
1031
1033
1040
1087
1130
1134
1263
1276
1344
1351
1390
1437
1509
1542
1591
1661
1662
1733
1784
1785
1944
1970
2009
2012
2034
2092
2094
2100
2154
2212
2233
2353
2367
2446
2459
2475
2543
54
2321
463
1206
2124
77
2027
2417
1229
46
1290
390
1454
2278
278
1411
1723
2500
78
828
2077
622
1527
1867
230
919
1368
1021
2088
834
2305
1050
1444
1113
1466
1660
2495
40
45
221
533
571
829
890
1024
1210
1246
1380
1473
1642
1832
1887
2060
2134
2203
2211
2320
2435
439
440
570
826
1268
1464
2309
2311
2569
273
449
516
868
900
927
1157
1354
1588
1608
1638
2021
2090
2300
2329
2410
2441
361
554
766
1047
1357
1414
1415
1792
1852
2051
2231
2244
2296
2368
2387
2494
2528
239
518
274
2579
739
2529
1828
423
2391
1938
2250
419
507
588
1037
1094
1771
1794
2132
2227
2288
619
1933
2582
1604
443
577
836
843
1028
1265
1664
2099
2135
2322
566
49
1179
1537
1804
367
444
517
548
820
1610
2553
307
738
758
971
1119
1266
1305
1412
1572
2411
909
1012
606
789
1456
2252
1029
124
558
1142
2089
424
1062
1251
1883
272
1438
2171
321
973
1364
422
1550
2514
2515
608
2293
1636
2328
2480
567
799
2138
1151
1353
402
1054
1472
2263
2264
346
1701
496
1308
692
1699
855
604
895
1910
668
2166
1253
2430
2199
377
931
378
1710
1711
2010
1611
2490
624
2172
1462
993
936
2008
2323
2208
1919
2133
1939
18
1528
2037
32
2248
263
296
2222
1381
2112
477
1772
1841
2517
270
987
2170
1948
2505
332
1496
2251
1038
1228
1959
2436
85
341
1432
2345
1470
1756
93
352
1769
1729
2083
326
2031
612
1500
1001
1974
21
995
2136
1538
1555
2364
2054
1240
2189
1831
25
2548
2559
1579
57
84
2066
998
1427
2504
1007
1241
66
2004
2256
1601
613
754
1605
200
338
2280
1367
2150
968
2489
1973
201
1665
997
1439
1668
2527
303
396
897
906
976
1449
2111
2287
1996
1876
436
1010
1180
2020
1255
2567
1595
1903
1145
2057
2115
508
1064
452
1196
1383
1025
1146
746
2392
861
915
27
1002
1896
43
1594
1247
182
871
1399
903
1009
2319
73
1295
2352
2191
106
1863
2192
107
1989
1681
532
1424
1431
1960
406
2044
791
1086
1223
2197
166
595
250
339
1227
2351
1428
1446
2418
1942
2290
2537
4
671
2338
198
751
1248
540
1590
382
2484
1388
1172
1173
2270
2523
1624
1790
13
2581
538
1311
2235
1287
2017
71
500
673
921
2078
504
1109
1881
778
1132
1589
1606
1110
1164
1169
90
1041
1905
383
2181
621
1375
670
916
753
236
723
1567
1623
591
2563
727
1880
183
747
1420
210
2217
2147
1708
453
520
547
587
690
691
1270
2186
2187
2491
1434
2447
1750
2202
70
2499
1063
1260
1418
1992
2030
373
2033
986
1967
1124
2221
2355
2439
933
369
1516
769
1634
2234
1788
669
1820
1402
5
592
1048
512
515
1628
246
264
448
1285
1598
2318
2426
564
1220
1532
1495
1583
2085
631
2575
1494
225
961
1471
286
2303
80
575
1174
1175
1612
1648
2268
2538
544
2388
2580
2314
399
1779
1935
412
2073
1743
1762
2473
864
1979
1103
47
950
2214
618
2195
418
2432
674
1372
1327
2354
617
2557
935
1348
2114
67
343
590
557
761
2534
119
1320
1751
686
2407
380
1461
64
854
2247
2460
539
1325
1045
245
629
1183
1129
1840
2406
293
1280
1324
1440
1478
1988
337
645
772
1534
611
1757
525
620
866
885
1036
1497
2194
2216
634
654
655
1622
2434
2464
2511
582
1901
2070
2506
805
1342
2524
2084
176
1480
2230
2341
689
1520
1393
1968
351
783
884
2307
741
865
1800
2059
1801
2063
265
398
930
1507
1822
292
697
812
8678
1258
1101
1267
1856
2308
633
928
1902
1809
1277
1396
1098
1865
1122
2028
2029
1858
1186
1293
1837
2182
2474
44
749
1300
2013
22
320
782
703
1307
2198
140
609
329
1651
1846
1851
705
2301
2578
811
886
2130
1111
1159
1160
1108
1117
2573
89
1897
252
762
647
2079
958
2412
2486
2297
2310
94
1789
144
145
1647
1707
223
1844
1301
1640
2295
251
977
1720
505
514
969
1369
1416
696
724
1505
1219
1309
1904
1336
1607
2254
1745
1780
180
934
1114
205
1709
2550
117
1719
165
482
1951
1215
2572
445
1074
1451
1906
824
1225
962
2169
118
446
1515
125
1932
2069
2403
603
1096
1131
1931
2564
2549
489
1365
1181
1853
589
1878
984
2201
1460
1814
421
1911
542
779
1498
1888
2093
550
904
2449
926
2455
983
1034
1015
1121
1296
2022
2416
153
658
1197
1207
1450
1928
2146
208
209
456
748
1735
498
1985
1565
1980
1592
1766
543
1201
1200
838
2141
1236
1714
985
531
1838
844
1882
2188
1084
1095
275
447
2041
1148
2317
1748
1855
1912
2121
1765
481
2178
2423
497
1259
1576
1346
2324
2376
1749
1230
2304
2395
6
408
891
1649
101
485
551
825
917
1165
1524
1600
1631
1644
244
1052
717
301
355
1731
1499
1586
1724
2265
2463
416
1629
1799
1857
1893
2379
2385
2530
1120
2468
2588
1158
1561
1551
556
1389
1553
722
1274
1403
1688
1921
1987
2302
2404
154
529
218
1946
553
1395
625
808
237
271
1909
305
306
1475
565
1275
859
1556
752
1956
1360
1235
2558
7
214
317
704
804
975
1232
1321
1404
1861
2452
2453
650
920
1039
1302
1425
2485
190
1184
1005
2346
356
1166
1185
1273
1566
1627
1824
2298
2451
232
400
486
576
988
1304
1349
1517
1685
1726
1744
2184
2274
2493
156
1934
2204
295
511
638
718
822
1137
1203
1205
1529
1593
1845
2056
526
1543
1682
623
744
1090
1761
2228
2598
16
88
410
431
695
698
1176
1447
1486
1570
1650
1913
2229
2246
2419
2472
52
374
394
809
1492
1889
2046
685
1563
2095
2126
2161
2162
2378
36
2415
185
435
1097
1292
1319
1704
1869
2285
2107
2108
11
1178
2283
349
776
1269
560
2330
938
1162
1737
1288
1986
847
1303
1614
1965
174
334
664
1548
1479
1966
702
2390
1508
1947
357
1312
922
461
462
581
1358
2072
2520
2062
2152
325
1237
1493
2011
1533
79
397
1675
2183
2356
2331
391
688
1013
1540
1613
2260
2405
1621
2123
96
1161
2583
1405
2249
2565
342
1930
2498
1521
893
1721
2109
2433
1808
2333
939
1271
280
2086
2039
2040
2224
1115
1222
1781
1261
375
666
2561
1231
720
2087
1981
1256
438
1272
1421
1815
1868
1218
1291
616
1618
2185
541
1854
1962
667
839
1864
2113
957
1083
1963
2239
2518
359
1407
644
941
1298
457
458
1547
2562
1445
1055
2294
2344
1262
2551
2117
2526
376
713
506
2519
714
97
1617
204
2238
434
2284
128
179
2015
1363
1929
2337
122
134
249
1569
1619
222
1687
2462
1359
1834
2431
255
257
706
510
680
1514
2209
874
2000
298
851
972
1635
1817
2232
318
1643
1891
768
1717
407
1501
1759
2313
2025
469
709
572
291
1741
1541
2533
600
676
1139
967
1188
2299
675
2587
678
1123
1214
1469
2024
2282
519
740
1394
1457
1807
2253
815
1575
853
1459
1504
9
202
279
308
328
470
745
786
801
1136
1141
1382
1410
1481
1585
1819
1953
2116
2242
2289
2488
2601
114
1078
2291
19
247
313
368
388
474
479
490
491
495
524
681
725
726
1152
1194
1347
1376
1483
1484
1544
1577
1676
1686
1821
1940
1983
1998
1999
2103
2219
2374
2422
39
160
161
315
840
883
1836
1991
2591
800
1106
1805
683
684
733
1140
1602
1632
1716
1971
1972
2516
58
157
178
330
433
509
679
712
785
869
1107
1306
1833
2042
2043
2091
2271
2312
2315
2397
2398
2424
2428
2599
562
2165
1000
1057
48
269
466
845
964
1725
1742
158
1335
1802
215
426
721
795
898
1049
1337
1409
2342
120
191
807
2096
2400
2401
409
894
923
643
1453
50
68
1835
1916
95
123
216
217
411
425
770
1177
1502
1641
1732
1823
2148
98
569
1637
1816
951
2106
955
1069
1695
1696
1705
2064
2167
2276
2292
192
211
1035
2168
2277
932
1917
2159
14
846
1426
1890
2065
2164
2275
138
219
862
1511
1587
1860
69
978
2266
2334
956
1522
105
1458
1564
1190
467
2377
1105
806
1826
1212
1060
2566
437
527
882
23
24
72
116
188
389
450
639
682
1153
1226
1281
1318
1374
1578
1778
2286
2360
2501
115
331
336
372
771
848
1093
1198
1392
1513
1557
1818
2003
2456
2542
2586
2600
206
471
605
672
946
948
954
1217
1315
1366
1429
1482
1580
1639
1646
2045
2175
2196
2336
2347
2420
2570
2592
1391
1371
2470
1487
2210
33
314
415
459
478
480
573
858
1026
1135
1167
1257
1282
1568
1597
1693
1718
1746
1777
1872
1995
2153
2380
2508
2576
2577
2585
56
632
699
892
899
1189
1328
1503
1770
1775
1776
1796
1975
2149
2383
2384
2461
1099
1700
2450
186
875
2101
1850
2158
2362
2544
614
1477
731
1334
12
1560
545
708
2035
147
268
656
1168
1786
2502
1163
1954
2492
1758
451
546
1102
1058
1070
615
630
65
2413
1657
233
974
1104
901
2225
659
2160
959
350
552
2014
1549
831
1066
1075
1022
2363
1571
2137
2443
2539
363
1079
2058
610
1127
1128
2180
2594
2156
17
203
501
730
863
1043
1044
1736
1898
1969
62
108
151
199
366
430
728
1245
1283
1286
1417
1630
1730
1791
1914
1984
2173
2273
2386
2425
2444
276
1782
175
327
1067
1310
261
735
1539
2255
574
1077
1908
852
2118
2262
1536
2102
53
148
262
370
392
636
870
947
953
966
1294
1299
1345
1430
1441
1659
1715
1747
1847
2018
2049
2176
2179
2408
2458
2536
63
687
693
949
1027
1118
1370
1523
1525
1603
2104
2402
2509
2545
1323
135
765
989
2540
126
1941
2131
193
227
290
393
742
1068
1191
1195
1706
1811
2409
2429
942
2532
136
181
207
335
719
756
902
1199
1422
1545
1558
1616
1625
1698
1810
1827
2019
2122
2177
2479
2571
2129
20
732
757
1582
2080
387
2050
2174
34
109
694
734
823
990
1170
1213
1655
2119
149
194
287
371
657
1596
1626
1658
2440
1080
1922
1530
1961
1674
2140
288
832
1278
1448
2541
1518
2007
2469
2139
28
170
1546
2399
700
1030
2061
1994
127
1871
1046
103
1670
2442
743
1465
522
628
2481
2076
1408
878
1531
1574
1918
2471
41
381
1340
798
1202
2240
340
2200
651
661
1330
1455
1331
2350
1413
2357
1352
2512
141
1997
254
2427
2375
1006
1839
1923
1734
294
475
1955
2332
583
1739
1926
1
173
347
849
1081
1100
1192
1488
1752
1783
2205
2259
2382
2589
121
155
285
563
790
796
929
992
1019
1419
1468
2023
2125
130
55
258
534
231
1332
1859
2151
535
1387
1519
737
2467
1149
230
919
1368
1021
2088
834
2305
1050
1444
2
133
142
184
248
637
653
818
889
912
1193
1243
1254
1615
1678
1993
2163
2261
2555
26
59
104
159
172
428
663
860
1014
1452
1485
1559
1620
1663
2267
2525
2554
2584
1116
37
163
195
234
358
362
598
784
813
965
1020
1032
1042
1059
1065
1088
1249
1252
1343
1795
1825
1892
1899
1925
1964
2002
2258
2358
2535
2546
2597
162
297
536
652
856
1238
1244
1510
1552
1554
1609
1727
1738
1787
1945
2048
2257
2335
2373
2457
2496
2497
110
918
1373
1978
360
2036
1073
1506
1204
2223
15
82
129
131
312
660
677
833
905
913
1089
1138
1239
1329
1378
1512
1584
1680
1874
1884
2343
2448
2552
83
91
112
150
152
260
267
282
299
549
599
764
1053
1379
1753
2075
2145
2547
132
2478
1673
1581
1873
1653
30
87
137
235
256
345
364
403
594
907
1092
1284
1476
1489
1526
2047
2207
2272
2349
2361
2396
2596
31
100
220
243
277
281
354
455
460
483
710
814
850
970
1689
1900
2052
2215
2339
2348
2465
2574
2595
2068
35
102
259
266
309
311
429
729
819
837
999
1187
1250
1264
1760
1768
1806
1886
1920
2001
2067
2325
2340
2359
2556
420
1573
1728
1091
1669
2522
3
405
528
2098
2365
2366
2477
642
777
1072
1677
384
711
925
1234
1316
1341
1793
1875
1927
842
1377
1915
2454
2097
29
1385
385
417
641
835
1008
1233
2038
2466
2560
242
810
841
982
1156
1297
1436
1672
1683
1763
1798
1862
1990
2483
627
1773
189
310
530
2327
788
943
2438
649
1144
2144
945
1211
1406
1171
1491
1694
224
379
503
940
1314
1645
1076
2437
240
2389
952
2074
1797
2237
2326
395
473
502
561
880
1463
1812
1813
944
2206
2218
2306
1339
2414
981
1703
1656
1895
2157
2243
61
143
465
472
593
646
755
792
876
1003
1384
1671
1907
2120
2245
2269
2503
169
253
322
877
1004
1056
1071
1397
1713
2128
2220
2281
2482
2568
996
1870
1279
1866
413
414
468
763
896
1150
1155
1401
2110
2155
2241
2226
736
1652
1338
1490
86
559
2381
441
1936
1442
168
2236
442
1356
1755
1937
92
333
464
817
1082
1182
1216
1350
1690
1712
1767
2393
537
802
1126
1289
1702
1754
1803
1894
1950
2193
2421
139
323
348
1386
1692
1112
1355
1423
1467
1957
2372
787
1740
908
2510
283
284
816
924
1154
2055
2127
2476
1361
662
1333
2005
1764
1843
42
60
586
487
910
2026
2006
1562
319
1317
1051
488
585
1829
881
196
821
596
872
2032
241
493
911
665
1952
887
1017
2105
238
914
300
1684
365
707
2142
2593
1145
2057
2115
74
289
167
888
499
584
2531
197
793
555
1666
324
937
2071
1209
1435
1830
2370
513
1697
780
1147
1085
1958
794
2521
1943
2590
759
803
1599
873
1326
2371
76
146
750
1224
1535
1654
316
1143
2394
648
1885
716
1443
2487
1125
1679
1691
75
2316
640
635
1400
1633
2190
879
1879
1949
81
701
568
111
602
1208
1433
601
1667
1722
960
2143
2016
344
979
580
1977
401
1982
830
1976
1398
1242
1313
10
38
99
113
164
171
187
212
213
226
228
229
302
304
386
404
427
432
454
484
494
521
523
715
760
767
773
774
775
781
797
857
980
991
1011
1023
1031
1033
1040
1087
1130
1134
1263
1276
1344
1351
1390
1437
1509
1542
1591
1661
1662
1733
1784
1785
1944
1970
2009
2012
2034
2092
2094
2100
2154
2212
2233
2353
2367
2446
2459
2475
2543
54
2321
463
1206
2124
77
2027
2417
1229
78
828
2077
1527
622
815
1575
291
1741
1541
2533
519
740
853
1394
1457
1807
600
676
1139
967
1188
2299
675
2587
678
1123
1214
1469
2024
2282
452
1196
1383
1025
1146
746
2392
861
915
1867
1113
1466
1660
2495
40
45
221
533
571
829
890
1024
1210
1246
1380
1473
1642
1832
1887
2060
2134
2203
2211
2320
2435
439
440
570
826
1268
1464
2309
2311
2569
443
577
836
843
1028
1265
1664
2099
2135
2322
619
1933
2582
239
518
2579
423
2391
274
739
2529
1828
272
1438
2171
321
973
1364
419
507
588
1037
1094
1771
1794
2132
2227
2288
566
1604
1938
2250
49
1179
1537
1804
367
444
517
548
820
1610
2553
307
738
758
971
1119
1266
1305
1412
1572
2411
909
1012
1029
606
789
1456
2252
273
449
516
868
900
927
1157
1354
1588
1608
1638
2021
2090
2300
2329
2410
2441
361
554
766
1047
1357
1414
1415
1792
1852
2051
2231
2244
2296
2368
2387
2494
2528
46
1290
390
1454
2278
278
1411
1723
2500
402
1054
1472
2263
2264
124
558
1142
2089
424
1062
1251
1883
422
1550
2514
2515
608
2293
1636
2328
2480
567
799
2138
1151
1353
346
1701
496
1308
692
1699
604
895
1910
1253
2430
2199
668
2166
377
378
931
1710
1711
2010
1611
855
2490
508
1064
624
2172
1462
993
2208
936
2008
2323
1919
2133
1939
18
1528
2054
21
995
2136
1555
2364
1538
32
2248
263
1240
2189
1831
326
1241
57
2504
84
2066
998
1007
270
987
2170
2037
1470
1948
2505
296
2222
1381
2112
477
1772
1841
2517
85
341
2031
352
1432
2345
1756
2083
93
1729
1769
1001
1974
25
1579
2548
2559
1228
1959
2436
332
1038
1496
2251
626
1061
1877
1322
1924
2507
51
1474
597
1842
2213
2445
2513
476
1774
2369
177
1133
2279
607
2053
353
579
1016
578
1018
827
994
66
1601
613
1605
754
2004
2256
436
1010
492
1180
1255
2567
1595
1903
200
338
2280
303
968
2150
201
1427
1973
2489
997
1439
1367
1665
1996
1668
2527
396
897
976
1449
2111
906
2287
1876
27
1002
1594
43
2319
1247
871
1399
903
1009
2020
141
2537
743
1446
2418
182
1997
1734
2375
254
2427
1839
1006
1923
73
1295
2352
2191
532
1424
1431
107
1989
1681
612
1500
1896
406
791
1086
1223
1960
2044
2197
106
1863
2192
250
339
2351
1227
1942
2290
166
595
28
170
2399
1465
1046
1546
294
475
391
2332
1531
1574
1918
2471
127
1871
628
2481
2076
878
1408
1994
700
1030
2061
522
1371
2210
661
1487
2470
41
1202
381
1340
340
2200
651
1330
1455
103
1670
2442
1331
2350
1413
2357
798
1352
2512
1428
4
671
2338
80
575
1174
1175
1612
1648
2268
2538
538
1311
2235
540
1590
544
2388
2580
412
2073
864
1979
1103
176
1480
1762
2473
2314
246
264
448
1285
1598
2318
2426
689
1520
1393
1968
741
865
1800
2059
225
961
1471
2078
286
2303
1743
631
2575
1494
647
2079
2230
2341
958
2412
2297
2486
882
2310
1098
1865
1657
2028
2029
5
592
1048
512
515
1628
399
1779
1935
418
2432
688
1013
1540
1613
1172
1173
2260
2405
47
950
1327
2354
618
2195
674
1372
2214
67
2534
557
761
564
1220
1532
1495
1583
2085
64
854
2247
2460
539
1325
1045
293
1280
1324
1440
611
1757
582
1901
2070
2506
614
1477
1478
1988
731
1334
1342
2524
2084
292
697
812
867
1258
351
783
884
2307
1129
1840
2406
1101
1267
1856
2308
525
620
866
885
1036
1497
2194
2216
633
928
1902
1122
1801
2063
634
654
655
1622
2434
805
1621
2123
1277
1396
2464
2511
8
1858
22
320
782
703
1307
1186
1293
1837
2182
2474
44
749
901
2225
609
1163
1954
2492
65
2413
2198
233
974
1104
1300
2013
265
398
930
1507
1822
89
1897
2301
2578
252
762
140
705
1117
2573
329
2130
1309
1904
1108
1851
1651
1846
1111
1159
1160
12
1560
1128
545
708
2035
1022
2363
1549
2180
2594
363
1079
2058
610
1127
1067
1310
350
552
2014
831
1066
1075
1571
2137
2443
2539
2156
659
2160
959
147
268
656
1058
1070
1168
1786
2502
1758
451
615
546
1102
20
732
757
1582
2080
2050
2174
630
387
1674
2140
1530
1961
34
109
694
734
823
990
1170
1213
1655
2119
149
194
287
371
657
1596
1626
1658
2440
1080
1922
193
227
290
393
742
1068
1191
1195
1706
1811
2409
2429
126
2139
1941
2131
1518
2007
2469
135
765
989
2540
288
832
1278
1448
2541
136
181
207
335
719
756
902
1199
1422
1545
1558
1616
1625
1698
1810
1827
2019
2122
2177
2479
2571
942
2532
1323
53
148
262
370
392
636
870
947
953
966
1294
1299
1345
1430
1441
1659
1715
1747
1847
2018
2049
2176
2179
2408
2458
2536
62
108
151
199
366
430
728
1245
1283
1286
1417
1630
1730
1791
1914
1984
2173
2273
2386
2425
2444
63
687
693
949
1027
1118
1370
1523
1525
1603
2104
2402
2509
2545
206
471
605
672
946
948
954
1217
1315
1366
1429
1482
1580
1639
1646
2045
2175
2196
2336
2347
2420
2570
2592
276
1782
175
327
806
1826
1212
261
735
1539
2255
245
629
1183
337
645
574
1077
1908
772
1534
852
2118
2262
2129
1536
2102
6
408
891
1649
556
1389
1553
717
407
1501
1759
2313
722
1274
1403
1688
1921
1987
2302
2404
122
134
249
1569
1619
222
1687
2462
1359
1834
2431
101
485
551
825
917
1165
1524
1600
1631
1644
244
1052
416
1629
1799
1857
1893
2379
2385
2530
1120
2468
2588
1158
1561
1551
255
257
706
874
2000
318
1643
510
768
1717
680
1514
2209
298
851
972
1635
1817
2232
1891
2025
469
709
2240
572
97
1617
128
204
2238
179
2015
2253
434
2284
2185
438
1272
616
1618
1218
1291
1421
1815
1868
1363
1929
2337
117
1719
165
1215
2572
482
1951
589
1878
984
2201
1460
1814
489
1365
1181
1853
421
542
1911
779
1498
1888
2093
1459
1504
119
1320
1751
343
590
380
1461
686
2407
617
2557
935
1348
2114
342
1930
2498
1521
839
1864
2113
453
520
1909
547
587
690
359
1407
1055
2294
2344
644
941
1298
957
1083
2239
2518
2107
2108
541
1115
667
1963
1854
1962
375
2551
2117
2526
457
458
1547
2562
1262
1445
376
713
506
2519
527
714
280
2086
1222
1781
2039
2040
2224
720
2087
1981
1256
1231
1261
963
1221
1362
2081
2082
1848
1849
7
214
317
704
804
975
1232
1321
1404
1861
2452
2453
301
355
1731
565
1275
650
920
1039
1302
1425
2485
625
808
1499
1586
1724
2265
2463
16
88
410
431
695
698
1176
1447
1486
1570
1650
1913
2229
2246
2419
2472
50
68
1835
1916
52
374
394
809
1492
1889
2046
36
2415
1869
2285
185
435
1097
1292
1319
1704
685
1563
2095
2126
2161
2162
2378
156
1934
2204
190
1682
356
1166
1185
1273
1566
1627
1824
2298
2451
295
511
638
718
822
1137
1203
1205
1529
1593
1845
2056
526
1543
623
744
1090
1761
2228
2598
232
400
486
576
988
1304
1349
1517
1685
1726
1744
2184
2274
2493
859
1556
1005
2346
2062
2152
305
306
1235
2558
11
1178
2283
349
776
1269
824
1225
1074
962
2169
1451
1906
154
529
218
1946
752
1956
1360
1475
237
271
553
1395
79
397
1675
2183
2356
560
2330
1614
1965
96
1161
2583
1405
2249
938
1162
1288
1986
357
1312
922
893
1721
2109
2433
939
1271
1808
2333
174
334
664
1548
1479
1966
1737
461
462
581
1358
2072
2520
847
1303
702
2390
1508
1947
325
1237
1493
2011
2331
1533
2549
9
202
279
308
328
470
745
786
801
1136
1141
1382
1410
1481
1585
1819
1953
2116
2242
2289
2488
2601
114
1078
2291
33
314
415
459
478
480
573
858
1026
1135
1167
1257
1282
1568
1597
1693
1718
1746
1777
1872
1995
2153
2380
2508
2576
2577
2585
115
331
336
372
771
848
1093
1198
1392
1513
1557
1818
2003
2456
2542
2586
2600
1391
23
24
72
116
188
389
450
639
682
1153
1226
1281
1318
1374
1578
1778
2286
2360
2501
56
632
699
892
899
1189
1328
1503
1770
1775
1776
1796
1975
2149
2383
2384
2461
186
875
2101
1850
2158
2362
2544
1099
1700
2450
19
247
313
368
388
474
479
490
491
495
524
681
725
726
1152
1194
1347
1376
1483
1484
1544
1577
1676
1686
1821
1940
1983
1998
1999
2103
2219
2374
2422
95
123
216
217
411
425
770
1177
1502
1641
1732
1823
2148
158
1335
1802
215
426
721
795
898
1049
1337
1409
2342
39
160
161
315
840
883
1836
1991
2591
1184
1805
683
684
733
1140
1602
1632
1716
1971
1972
2516
800
1106
58
157
178
330
433
509
679
712
785
869
1107
1306
1833
2042
2043
2091
2271
2312
2315
2397
2398
2424
2428
2599
562
1000
1057
2165
14
846
1426
1890
2065
2164
2275
105
1458
978
2266
2334
17
203
501
730
863
1043
1044
1736
1898
1969
138
219
862
1511
1587
1860
1190
69
956
1522
192
211
1035
2168
2277
98
569
1637
1816
437
467
2377
1564
1060
2566
1809
1105
48
269
466
845
964
1725
1742
932
1917
2159
955
1069
1695
1696
1705
2064
2167
2276
2292
120
191
807
2096
2400
2401
409
894
923
643
1453
2565
951
2106
1955
13
2581
504
1109
1881
778
1132
1589
1606
90
1041
1905
621
1375
670
916
753
71
500
673
921
1287
2017
382
2484
1388
198
751
1248
1624
1790
2270
2523
591
2563
727
1880
70
2499
1260
1418
1992
2030
236
723
1063
1567
1623
583
1739
1926
933
118
446
1515
445
125
1932
2069
208
209
383
2181
1131
1931
2403
2564
456
603
1096
180
934
205
1709
2550
666
2561
1708
1114
183
210
2217
2147
1434
2447
1750
2202
691
1270
2186
2187
2491
747
1420
94
1789
1745
1780
724
1505
1607
2254
144
145
1647
1707
251
977
1720
1219
1336
505
514
969
1369
1416
696
811
886
223
1844
1301
1110
1640
2295
1164
1169
2033
369
1516
669
1820
769
1788
1634
2234
1402
373
1236
1714
838
2141
986
1967
1124
1592
1766
2221
2355
2439
153
497
1259
1576
481
2178
1148
2423
275
2317
447
2041
1765
1230
2304
2395
1346
2324
2376
1749
1748
1855
1912
2121
498
1985
748
1735
658
1565
1980
531
1838
1928
2146
543
1201
844
1197
1207
1450
1200
1084
1882
2188
1095
985
550
904
2449
2022
2416
926
2455
983
1015
1034
1121
1296
0
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
Dendrogram, OM1 versus OM3
different indel costs (1 vs 3)
Dendrogram of agnes(x = dist.om3, diss = TRUE, method = "ward")
OM3
dist.om3
Agglomerative Coefficient = 1
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
State distribution by age, within cluster
0
1
2
3
4
5
6
7
1.6 %
1.7 %
1.8 %
1.0
Groupe 3
1.0
Groupe 2
1.0
Groupe 1
0.8
0.8
2.4 %
0.6
Frequency
Frequency
0.6
2.4 %
0.4
0.2
0.4
3.5 %
0.2
0.2
0.4
Frequency
0.6
0.8
2%
0.0
0.0
0.0
4.3 %
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Age
Age
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Age
4.5 %
1.0
Groupe 6
1.0
Groupe 5
1.0
Groupe 4
0.8
0.8
A21
0.6
A19
A23
A25
A27
A29
0.2
0.4
Age
0.0
0.2
0.4
Frequency
A17
Frequency
0.6
A15
0.0
0.0
0.2
0.4
Frequency
0.6
0.8
4.7 %
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Age
Age
Age
13/7/2008gr 58/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
Most frequent sequences by cluster
0
1
2
3
4
5
6
7
1.6 %
Groupe 1
Groupe 2
Groupe 3
1.7 %
1.8 %
5.1 %
2.3 %
6.5 %
2.3 %
6.5 %
2.6 %
6.9 %
2.6 %
8%
2.9 %
1.2 %
1.5 %
1.5 %
1.6 %
3.2 %
8%
1.3 %
2%
2.4 %
2.4 %
1.6 %
3.5 %
8.4 %
8.4 %
4.1 %
9.1 %
4.1 %
11.3 %
5%
A15
A22
A29
A36
A43
1.7 %
3.5 %
1.8 %
1.9 %
2.3 %
4.3 %
A15
A22
A29
A36
A43
A15
A22
A29
A36
A43
4.5 %
Age
Age
Age
Groupe 4
Groupe 5
Groupe 6
4.7 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
1.6 %
1.6 %
3.9 %
1.9 %
1.9 %
3.4 %
0.8 %
0.8 %
4.4 %
0.8 %
A15
0.8 %
A19
A21
4.8 %
0.8 %
A23
A25
Age
4.8 %
0.8 %
7.8 %
0.8 %
57.5 %
A17
4.8 %
0.8 %
8.2 %
0.8 %
10.2 %
1.3 %
A15
A22
A29
Age
13/7/2008gr 59/86
A36
A43
A15
A22
A29
Age
A36
A43
A15
A22
A29
Age
A36
A43
A27
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
I-plot by cluster
0
1
2
3
4
5
6
7
1.6 %
1.7 %
1.8 %
2%
2.4 %
2.4 %
3.5 %
4.3 %
4.5 %
4.7 %
A15
A17
A19
A21
A23
Age
13/7/2008gr 60/86
A25
A27
A29
Mining Event or State Sequences
Visualizing and clustering sequence data
Distances between sequences: Clustering
Distribution by birth cohort within each cluster
Année de naissance (Groupe 2)
Année de naissance (Groupe 3)
300
250
200
150
Frequency
30
Frequency
30
1920
1930
1940
1950
1960
50
1910
1920
1930
1940
1950
1960
1910
1920
1930
1940
1950
année
année
année
Année de naissance (Groupe 4)
Année de naissance (Groupe 5)
Année de naissance (Groupe 6)
1960
1910
13/7/2008gr 61/86
1920
1930
1940
année
1950
1960
40
30
Frequency
0
0
0
10
10
5
20
20
30
Frequency
10
Frequency
40
15
50
60
50
20
1910
0
0
0
10
10
20
100
20
Frequency
40
40
50
50
60
Année de naissance (Groupe 1)
1910
1920
1930
1940
année
1950
1960
1910
1920
1930
1940
année
1950
1960
Mining Event or State Sequences
Visualizing and clustering sequence data
Multidimensional Scaling representation of sequences
Multidimensional Scaling: Principle
Let D be a distance matrix between sequences.
D computed using OM, LPS, LCS, ... metrics.
Multidimensional Scaling consists in
Finding
p a set of real valued variables (f1 , f2 ) such that the
δij = (fi 1 − fj 1)2 + (fi 2 − fj 2)2 best approximate the
distances dij . between sequences.
Plotting the points in the (f1 , f2 ) space.
13/7/2008gr 63/86
Mining Event or State Sequences
Visualizing and clustering sequence data
Multidimensional Scaling representation of sequences
Multidimensional Scaling
●
●
●
●
● ●
●
●
●
●
●
●
●
30
● ●
●
●
●●
●
●
●●
●
●
●
●
●
20
● ●
●
●
●
●
●
10
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−20
●
13/7/2008gr 64/86−30
●
● ●●
● ●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
−20
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Groupe 1
Groupe 2
Groupe 3
Groupe 4
Groupe 5
Groupe 6
●
●
●
●
●
●
● ●
● ●
● ●●●●
●
● ●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ● ●
● ●
● ●
●
●
●
●
●
●●
● ●
●
●
● ● ●
●
●
● ●●● ● ● ●
●
●● ● ● ●
●
●
0
dist.om.mds$points[,2]
●
●
●●
● ●
●
● ●●
●
●
●
●
●
●
●
●●
● ●
● ● ●●
●●
●
●
●
●●
●
−10
●●
● ●
●●
●
●
●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●●
●●
●
●
●
●●
−10
0
10
20
30
Mining Event or State Sequences
Mining Frequent Episodes
Mining Frequent Episodes
What can we expect from frequent episodes mining?
GSP (Srikant and Agrawal, 1996)
MINEPI, WINEPI (Mannila et al., 1997)
TCG, TAG (Bettini et al., 1996)
SPADE (Zaki, 2001)
Are there specific issues when applying these methods in
social sciences?
13/7/2008gr 66/86
Mining Event or State Sequences
Mining Frequent Episodes
What Is It About?
Frequent episodes. What is it?
Episode: Collection of events occurring frequently together.
Mining typical episodes:
Specialized case of mining frequent itemsets.
Time dimension ⇒ Partially ordered events.
More complex than unordered itemsets: User must
specify time constraints (and episode structure constraints).
select a counting method.
13/7/2008gr 68/86
Mining Event or State Sequences
Mining Frequent Episodes
What Is It About?
Episode structure constraints
For people who leave home within 2 years from their 17, what are
typical events occurring until they get married and have a first
child?
LH,17
elastic
w =1
event constraints
node constraint
13/7/2008gr 69/86
C1
parallel
??
)
,4
(0
w =2
(0, 1, 10)
(0
,3
)
edge constraints
M
Mining Event or State Sequences
Mining Frequent Episodes
What Is It About?
Counting methods
(Joshi et al., 2001)
Searching (U,C)
U
U
U
C
C
C
20
21
22
23
24
13/7/2008gr 70/86
min gap= 1, max gap= 2, win size= 2
indiv. with episode
COBJ = 1
windows with episode
CWIN = 3
min win. with episode
CminWIN = 2
distinct occurrences
CDIS_o = 5
dist. occ. without overlap
CDIS = 3
hi
ld
M <
ar M
a
r
C iag rria
hi e
ld < ge
= C
M hil
ar d
ria
ge
C
hi
ld
Jo <
b Jo
C <C b
hi h
ld ild
C
=
hi
Jo
ld
b
Ed <
uc Ed
u
C en c
hi
ld d < end
=
E d Chi
u c ld
en
M
ar
d
ria
Jo g
b e<
<
M M Jo
ar a b
ria rri
M
ge ag
ar
ria
= e
Ed g
Jo
b
uc e <
M en E d
ar d u
ria < c
ge M e n
= arri d
Ed ag
uc e
Jo
en
b
d
<
Ed E
uc du
c
J o en e
b d nd
= <
Ed J o
uc b
en
d
C
Mining Event or State Sequences
Mining Frequent Episodes
Example: Counting Alternate Episode Structures
Example: Counting alternate structures
13/7/2008gr 72/86
(COBJ, no max gap)
30%
25%
20%
15%
10%
5%
0%
Switzerland, SHP 2002 biographical survey (n = 5560).
Mining Event or State Sequences
Mining Frequent Episodes
Issues Regarding Episode Rules
Rules between episodes
Social scientists like causal explanations.
Empirically assessed rules are valuable material in that respect.
Little attention paid to this aspect in the literature on
frequent subsequences.
Mined episodes are already structured: if (U,C) is a frequent
episode, then we know that C often follows U.
Deriving association rules from frequent ordered patterns is
similar to what is done with unordered itemsets.
Rule relevance criteria: confidence, surprisingness, implication
strength, ...
Their value depends on the selected counting method.
13/7/2008gr 74/86
Mining Event or State Sequences
Mining Frequent Episodes
Issues Regarding Episode Rules
Issues with episode rules in social sciences
Parallel life courses:
Family events and professional life course.
Life courses of each partner of a couple.
Mining associations between frequent episodes of a sequence
with those of its parallel sequence.
Frequent episodes from mix of the 2 sequences, and then
restrict search of rules among candidates with premise and
consequence belonging to a different sequence.
Frequent episodes from each sequence, and then
search rules among candidates obtained by combining frequent
episodes from each sequence.
Accounting for multi-level effects when validating rules.
Is rule relevant among groups, or within groups?
13/7/2008gr 75/86
Mining Event or State Sequences
Summary
Summary
Data mining approaches (survival trees, clustering sequences,
frequent episodes) have promising future in life course
analysis.
Complement classical statistical outcomes with new insights.
Their use within social sciences raises specific issues:
Accounting for multi-level effects when growing survival tree or
mining association rules.
Handling time varying predictors in survival trees.
Selecting relevant counting methods (event dependent)?
Suitable criteria for measuring association strength between
frequent episodes.
...
13/7/2008gr 76/86
Mining Event or State Sequences
Summary
Our TraMineR R-package
Let me finish with an Add ...
TraMineR, a free life trajectory mining tool
for the free open source R statistical environment.
downloadable from http://mephisto.unige.ch/biomining
and soon from the CRAN
13/7/2008gr 77/86
Mining Event or State Sequences
Summary
Thank
Thank You!
You!
13/7/2008gr 78/86
Mining Event or State Sequences
Appendix
Zoomed tree
n = 3 6 1 9
e = 6 2 2
Divorce, Switzerland, Differences
B i r t h in
C o KM
h o r t Survival Curves
T W (1 ) = 5 4 .8 , p < .0 0 0 1
I
> 1 9 4 0
£ 1 9 4 0
S < 9 0 %
S < 9 0 %
a t 2 1
S (3 0 ) = 7 3
S (3 0 ) = 8 6 %
n =
e =
n = 2 7 7 8
e = 4 9 9
8 4 1
1 2 3
C h ild
L a n g u a g e
T W (1 ) = 3 7 .4 , p
T W (1 ) = 2 2 .5 , p < .0 0 0 1
N o n F re n c h
S < 9 0 %
a t 2 6
S (3 0 ) = 8 9 %
n =
e =
6 6 7
7 9
U n iv e r s ity
T W ( 1 79/86
) = 8 .0 8 , p = .0 0 4 5
13/7/2008gr
a t
Y e s
F re n c h
S < 9 0 %
a t 1 1
a t 1 1
S (3 0 ) = 7 5 %
S (3 0 ) = 7 4 %
n =
e =
S < 9 0 %
n = 2 1 7 5
e = 3 6 1
1 7 4
4 4
L 3
L a n g u a g e
T W (1 ) = 9 .7 7 , p = .0 0 1 8
Mining Event or State Sequences
Appendix
Sub-sequences
Clusters and subsequences
m5
10
m5
c1
m1
0.0
10
0.0
13/7/2008gr 80/86
c1
0.2
s1
0.2
m1
0.4
e1
0.4
e5
0.6
10
0.6
e1
0.8
m1
0.8
d1
Groupe 2
1.0
m1
Groupe 1
1.0
Mining Event or State Sequences
Appendix
Sub-sequences
Biofam data: Legend
no event
left home
married with/without child
left home, married
with child
left home, with child
left home, married, child
divorced
13/7/2008gr 81/86
Mining Event or State Sequences
Appendix
For Further Reading
For Further Reading I
Abbott, A. and J. Forrest (1986). Optimal matching methods for
historical sequences. Journal of Interdisciplinary History 16,
471–494.
Bettini, C., X. S. Wang, and S. Jajodia (1996). Testing complex
temporal relationships involving multiple granularities and its
application to data mining (extended abstract). In PODS ’96:
Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART
symposium on Principles of database systems, New York, pp.
68–78. ACM Press.
13/7/2008gr 82/86
Mining Event or State Sequences
Appendix
For Further Reading
For Further Reading II
Billari, F. C. (2005). Life course analysis: Two (complementary)
cultures? Some reflections with examples from the analysis of
transition to adulthood. In P. Ghisletta, J.-M. Le Goff, R. Levy,
D. Spini, and E. Widmer (Eds.), Towards an Interdisciplinary
Perspective on the Life Course, Advancements in Life Course
Research, Vol. 10, pp. 267–288. Amsterdam: Elsevier.
Blossfeld, H.-P. and G. Rohwer (2002). Techniques of Event
History Modeling, New Approaches to Causal Analysis (2nd
ed.). Mahwah NJ: Lawrence Erlbaum.
Elzinga, C. H. (2008). Sequence analysis: Metric representations
of categorical time series. Sociological Methods and Research.
forthcoming.
13/7/2008gr 83/86
Mining Event or State Sequences
Appendix
For Further Reading
For Further Reading III
Elzinga, C. H. and A. C. Liefbroer (2007). De-standardization of
family-life trajectories of young adults: A cross-national
comparison using sequence analysis. European Journal of
Population 23, 225–250.
Huang, X., S. Chen, and S. Soong (1998). Piecewise exponential
survival trees with time-dependent covariates. Biometrics 54,
1420–1433.
Joshi, M. V., G. Karypis, and V. Kumar (2001). A universal
formulation of sequential patterns. In Proceedings of the
KDD’2001 workshop on Temporal Data Mining, San Fransisco,
August 2001.
Leblanc, M. and J. Crowley (1992). Relative risk trees for censored
survival data. Biometrics 48, 411–425.
13/7/2008gr 84/86
Mining Event or State Sequences
Appendix
For Further Reading
For Further Reading IV
Levenshtein, V. (1966). Binary codes capable of correcting
deletions, insertions, and reversals. Soviet Physics Doklady 10,
707–710.
Mannila, H., H. Toivonen, and A. I. Verkamo (1997). Discovery of
frequent episodes in event sequences. Data Mining and
Knowledge Discovery 1(3), 259–289.
Needleman, S. and C. Wunsch (1970). A general method
applicable to the search for similarities in the amino acid
sequence of two proteins. Journal of Molecular Biology 48,
443–453.
Segal, M. R. (1988). Regression trees for censored data.
Biometrics 44, 35–47.
13/7/2008gr 85/86
Mining Event or State Sequences
Appendix
For Further Reading
For Further Reading V
Segal, M. R. (1992). Tree-structured methods for longitudinal
data. Journal of the American Statistical Association 87 (418),
407–418.
Srikant, R. and R. Agrawal (1996). Mining sequential patterns:
Generalizations and performance improvements. In P. M. G.
Apers, M. Bouzeghoub, and G. Gardarin (Eds.), Advances in
Database Technologies – 5th International Conference on
Extending Database Technology (EDBT’96), Avignon, France,
Volume 1057, pp. 3–17. Springer-Verlag.
Zaki, M. J. (2001). SPADE: An efficient algorithm for mining
frequent sequences. Machine Learning 42(1/2), 31–60.
13/7/2008gr 86/86
Related documents