Download 03_SBP08v3_tsumoto

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

K-means clustering wikipedia, lookup

Cluster analysis wikipedia, lookup

Nearest-neighbor chain algorithm wikipedia, lookup

Transcript
Clustering of Trajectory Data obtained from
Soccer Game Record -A First Step to
Behavioral Modeling
Shoji Hirano Shusaku Tsumoto
[email protected]
[email protected]
Dept of Medical Informatics,
Shimane Univ. School of Medicine, Japan
Outline





Introduction
Data Structure
Method
Experimental Results
Conclusions and Future Work
Introduction

Clustering of Spatio-temporal Data



Provides a way to discover interesting characteristics
about the motion of targets
Related field: meteorology, medical image analysis,
sports, crime research etc.
Approaches



Spatial clustering + temporal continuity trace
(e.g. tracking of moving object)
Spatial clustering based on temporal correlation
(e.g. fMRI analysis)
Spatial clustering + observation of the temporal changes of the
clusters (e.g. Observation of the climate regimes)
Objective

Development of a clustering method for trajectories with
multiscale structural comparison scheme



Compare trajectories according to both local and global views.
Visualize common characteristics of trajectories
Application: Clustering of trajectories of passes in soccer
game records

Discovery of interesting spatio-temporal patterns of passes which
may reflect the strategy and tactics of the team


Globally similar passes: strategy of the team -ex. Attack from right side
Locally similar passes: tactics of the ream -x. Frequent use of one-two
passes
Data Structure

Soccer game records
(provided for research purpose by DataStadium Inc., Japan)
Series
1
1
1
1
1
Time
10:11:01
10:11:01
10:11:01
10:11:03
10:11:04
Action
Team1 Player1
KICK OFF
Fra
PASS
Fra
12
PASS
Fra
21
PASS
Fra
21
TRAP
Fra
17
Team2 Player2
Fra
Fra
Fra
Fra
21
21
17
Y1
0
-11
-47
71
841
3500
150
…
X1
-6
-6
6
80
31
2
Jap
11
12:03:04
Fra
19
…
10:12:00 THROW IN
37
P END
2432
71
X2
Y2
6
80
31
-47
71
841
Data Structure

Field geometry and Pass sequence
5346
Y
6000
IN GOAL
5000
4000
X
-3500
PASS start
3000
2000
t
3500
1000
0
-2500
-2000
-1500
-1000
-500
-1000
-5346
0
500
1000
1500
Pass sequence clustering: Problems

Irregularly-sampled spatio-temporal sequence



Data point is generated when a player takes an
interaction with a ball
High interaction -> Dense Data
Low interaction -> Sparse Data
6000
4000
Need for Multiscale Observation


Strategy -> global pass feature
Tactics -> local pass feature
Both exist concurrently
Dense
5000
3000
2000
1000
0
-2500
-2000
-1500
-1000
-500
-1000
0
500
1000
1500
Sparse
It is required to partly change comparison scale
according to the granularity of data and type of events
Trajectory Mining
Preprocessing
Segmentation and Generation of
Multiscale Trajectories
Segment Hierarchy Trace
and Matching
Calculation of Dissimilarities
Clustering of Trajectories
Method: Multiscale Matching

A pattern matching method that compares structural similarity
of planar curves across multiple observation scales
segment
Scale s
Matched
Pairs
Sequence A


Sequence B
Able to compare objects by partly changing observation scales
Simultaneously compare both global and local similarities
Multiscale Description




(Witkin et al 1984, Mokhatan et al. 1986)
Describe convex/concave structure
Scale s
at multiple scales
Sequence description:
c(t ) = ( x(t ), y (t )) t : course parameter
Sequence x(t) at scale s : X (t ,s )
X (t ,s ) = x(t )  g (t ,s )
( t u )2

1
2s 2
=  x(u )
exp
du

s 2
c(t ,s ) = ( X (t ,s ),Y (t ,s ))
Scale s controls the degree of smoothing

s = small: local feature, s = large: global feature
c(t ,s )
c(t ,0)
Multiscale Matching based on Convex/Concave
Structure of Segments (Ueda et al. 1990)


Segment: Partial sequence between
adjacent inflection points
Curvature K (t, s) at scale s
A (s )
Scale s
ci (t ,s )
X Y   X Y 
( X 2  Y 2 )3 / 2
 m X (t ,s )
( m)
(m)
X (t ,s ) =
=
x
(
t
)

g
(t ,s )
m
t
K (t ,s ) =


Inflection point: ci (t ,s ) : K (t  1,s )  K (t ,s )  0
Represent a sequence as a set of segments
A(s ) = ai(s ) | i = 1,2,..., N 
(0)
a2( 0 ) a1
A( 0 )
Matching Procedure
IN GOAL
Sequence B
B4(1)
B6(0)
B3(1)
B2(1)
B5(0)
B4(0)
B2(0)
B3(0)
B1(1)
B1(0)
B2(2)
B1(2)
B0(1)
B0(2)
B0(0)
Inflection Points
IN GOAL
A4(0)
A2(1)
A2(2)
A3(0)
A2(0)
Sequence A
A1(1)
A1(2)
t
A0(1)
A1(0)
A0(0)
Scale 0
Scale 1
A0(2)
Scale 2
Segment Dissimilarity

Dissimilarity of Segments ai( k ) , b (j h )
d (ai( k ) , b (j h ) ) =
Max(  a(ik )   b(jh ) ,
la(ik )
L
Rotation Angle

(k )
A

lb( jh )
(h)
B
L
ai
lai
(k )
 a(k )
i
)
Segment ai(k)
Length
Dissimilarity of sequences
P
D( A, B) =  d (a (p0 ) , b p( 0 ) )
p =1
P: the number of matched pairs
(k )
bj
lb j
(h )
(h )
 b(h )
j
Segment bi(j)
Indiscernibility-based Clustering: Overview
1.
Assignment of initial equivalence relations (ERs)
 Assign an initial ER to each of the N objects.
 An ER independently performs binary classification,
similar or dissimilar, based on the relative proximity.
 Indiscernible objects under all of the N ERs form a cluster.
2. Iterative refinement of initial ERs

For each pair of objects, count the ratio of ERs that
have ability to discriminate them (indiscernibility
degree)

If the number is small, assume that these ERs give too
fine classification and disable their discrimination ability

Iterate step2 until the clusters become stable
Experiments

Data



Game records of FIFA WorldCup 2002
(64 games, including all heats and finals)
Number of goals: 168 (own goals excluded)
Procedure



Select series containing ‘IN GOAL’ event, and
generate a total of 168 trajectories of 2-D ball location.
For every possible pair of the trajectories, calculate
dissimilarity by using multiscale matching.
Group the trajectories by using the obtained
dissimilarities and indiscernibility-based clustering
Experimental Results

Cluster Constitution
Cluster Cases
Cluster Cases
1
87
7
3
2
24
8
3
3
17
9
2
4
16
10
2
5
8
11
2
6
4
12
1
Note: 55.2% (7839/14196) of triplet in the dissimilarity matrix
did not satisfy the triangular inequality due to matching failure
Experimental Results (cont’d)

Cluster 1 (87 cases)
Corner Kick – Goal
Matching Result
IN GOAL
Turkey vs Japan
Europe: 45, South America: 24, Asia: 9
Italy vs Korea
Experimental Results (cont’d)

Cluster 2 (24 cases)
Complex Pass –
Side attack- Goal
Matching Result
IN GOAL
Poland vs Portugal
Europe: 13, South America: 7, Asia: 3
Germany vs
Cameroon
Experimental Results (cont’d)

Cluster 4 (16 cases)
Side Change –
Centering/Dribble – Goal
Matching Result
IN GOAL
Slovenia vs Paraguay
Europe: 10, South America: 4, Africa: 2
China vs Turkey
Experimental Results (cont’d)

Cluster 3 (17 cases)
Side Change –
Centering/Dribble – Goal
(Intermediate cases
between Cluster 2 and 4)
Europe: 10,
South America: 2,
Africa: 2
Asia 2
Summary of Experimental Results



Goal success patterns can be classified into 4 major
groups (with 8 minor patterns)
Patterns: complexity of pass sequences
With additional information

Dribble/Centering/Side change: European Style


However, the differences are not statistically significant.
Key is “Side Change”


Players (Defenders) should take care of the other side of the ball
movement.
The higher complexity of pass transactions, the higher rate of
goal success gains by side change.
Conclusions

Presented a new scheme of spatio-temporal data
mining



Grouped similar patterns using multiscale comparison
and indiscernibility-based clustering techniques.
Visualized similar patterns using matching results.
Application to real World Cup data:

Grouping and visualization of interesting pass patterns:
ex. Complex pass -> side attack -> goal
Future Work

Technical Issues



Apply the proposed method to all path series including non‘IN GOAL’ series



Numerical Evaluation
Validation and improvement of segment dissimilarity measure;
inclusion of event type to dissimilarity
Differences between success and failure are very small.
This suggests that the patterns of soccer attack are simple.
Apply the proposed method to medical environment


Trajectories of Laboratory Examinations (IEEE ICDM06)
Trajectories of Patients’ Movement: Patient Safety
Matching Criteria

Criteria for determining the best set of segment pairs

Complete match; original sequence should be correctly formed by
concatenating the selected segments without any overlaps or gaps
Overlap

Gap
Minimization of total segment difference
P
D( A, B) =  d (a , b )
p =1
(0)
p
(0)
p
a1
a3 a4
A
P : Number of matched segment pairs
d (a (p0) , bp(0) ):dissimiarity of segments a (p0) , bp( 0)
a2
a5
b1
B
b2
b3 b4
b5
Matching Failure Problem in MSM



Theoretically, any sequence can finally become a single segment at
enough high scales. Therefore, any pair of sequences should be
successfully matched.
Practically, there should be an upper limit of scales in order to
reduce computational complexity. Therefore, the number of
segments can be different even at the highest scales.
If matching is not successful, the method should return infinite
dissimilarity or a magic value that indicates matching failure.
match
Scale n
Scale 2
Scale 1
no-match
Trajectory Mining
Preprocessing
Segmentation and Generation of
Multiscale Trajectories
Segment Hierarchy Trace
and Matching
Calculation of Dissimilarities
Clustering of Trajectories