Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clustering of Trajectory Data obtained from Soccer Game Record -A First Step to Behavioral Modeling Shoji Hirano Shusaku Tsumoto [email protected] [email protected] Dept of Medical Informatics, Shimane Univ. School of Medicine, Japan Outline Introduction Data Structure Method Experimental Results Conclusions and Future Work Introduction Clustering of Spatio-temporal Data Provides a way to discover interesting characteristics about the motion of targets Related field: meteorology, medical image analysis, sports, crime research etc. Approaches Spatial clustering + temporal continuity trace (e.g. tracking of moving object) Spatial clustering based on temporal correlation (e.g. fMRI analysis) Spatial clustering + observation of the temporal changes of the clusters (e.g. Observation of the climate regimes) Objective Development of a clustering method for trajectories with multiscale structural comparison scheme Compare trajectories according to both local and global views. Visualize common characteristics of trajectories Application: Clustering of trajectories of passes in soccer game records Discovery of interesting spatio-temporal patterns of passes which may reflect the strategy and tactics of the team Globally similar passes: strategy of the team -ex. Attack from right side Locally similar passes: tactics of the ream -x. Frequent use of one-two passes Data Structure Soccer game records (provided for research purpose by DataStadium Inc., Japan) Series 1 1 1 1 1 Time 10:11:01 10:11:01 10:11:01 10:11:03 10:11:04 Action Team1 Player1 KICK OFF Fra PASS Fra 12 PASS Fra 21 PASS Fra 21 TRAP Fra 17 Team2 Player2 Fra Fra Fra Fra 21 21 17 Y1 0 -11 -47 71 841 3500 150 … X1 -6 -6 6 80 31 2 Jap 11 12:03:04 Fra 19 … 10:12:00 THROW IN 37 P END 2432 71 X2 Y2 6 80 31 -47 71 841 Data Structure Field geometry and Pass sequence 5346 Y 6000 IN GOAL 5000 4000 X -3500 PASS start 3000 2000 t 3500 1000 0 -2500 -2000 -1500 -1000 -500 -1000 -5346 0 500 1000 1500 Pass sequence clustering: Problems Irregularly-sampled spatio-temporal sequence Data point is generated when a player takes an interaction with a ball High interaction -> Dense Data Low interaction -> Sparse Data 6000 4000 Need for Multiscale Observation Strategy -> global pass feature Tactics -> local pass feature Both exist concurrently Dense 5000 3000 2000 1000 0 -2500 -2000 -1500 -1000 -500 -1000 0 500 1000 1500 Sparse It is required to partly change comparison scale according to the granularity of data and type of events Trajectory Mining Preprocessing Segmentation and Generation of Multiscale Trajectories Segment Hierarchy Trace and Matching Calculation of Dissimilarities Clustering of Trajectories Method: Multiscale Matching A pattern matching method that compares structural similarity of planar curves across multiple observation scales segment Scale s Matched Pairs Sequence A Sequence B Able to compare objects by partly changing observation scales Simultaneously compare both global and local similarities Multiscale Description (Witkin et al 1984, Mokhatan et al. 1986) Describe convex/concave structure Scale s at multiple scales Sequence description: c(t ) = ( x(t ), y (t )) t : course parameter Sequence x(t) at scale s : X (t ,s ) X (t ,s ) = x(t ) g (t ,s ) ( t u )2 1 2s 2 = x(u ) exp du s 2 c(t ,s ) = ( X (t ,s ),Y (t ,s )) Scale s controls the degree of smoothing s = small: local feature, s = large: global feature c(t ,s ) c(t ,0) Multiscale Matching based on Convex/Concave Structure of Segments (Ueda et al. 1990) Segment: Partial sequence between adjacent inflection points Curvature K (t, s) at scale s A (s ) Scale s ci (t ,s ) X Y X Y ( X 2 Y 2 )3 / 2 m X (t ,s ) ( m) (m) X (t ,s ) = = x ( t ) g (t ,s ) m t K (t ,s ) = Inflection point: ci (t ,s ) : K (t 1,s ) K (t ,s ) 0 Represent a sequence as a set of segments A(s ) = ai(s ) | i = 1,2,..., N (0) a2( 0 ) a1 A( 0 ) Matching Procedure IN GOAL Sequence B B4(1) B6(0) B3(1) B2(1) B5(0) B4(0) B2(0) B3(0) B1(1) B1(0) B2(2) B1(2) B0(1) B0(2) B0(0) Inflection Points IN GOAL A4(0) A2(1) A2(2) A3(0) A2(0) Sequence A A1(1) A1(2) t A0(1) A1(0) A0(0) Scale 0 Scale 1 A0(2) Scale 2 Segment Dissimilarity Dissimilarity of Segments ai( k ) , b (j h ) d (ai( k ) , b (j h ) ) = Max( a(ik ) b(jh ) , la(ik ) L Rotation Angle (k ) A lb( jh ) (h) B L ai lai (k ) a(k ) i ) Segment ai(k) Length Dissimilarity of sequences P D( A, B) = d (a (p0 ) , b p( 0 ) ) p =1 P: the number of matched pairs (k ) bj lb j (h ) (h ) b(h ) j Segment bi(j) Indiscernibility-based Clustering: Overview 1. Assignment of initial equivalence relations (ERs) Assign an initial ER to each of the N objects. An ER independently performs binary classification, similar or dissimilar, based on the relative proximity. Indiscernible objects under all of the N ERs form a cluster. 2. Iterative refinement of initial ERs For each pair of objects, count the ratio of ERs that have ability to discriminate them (indiscernibility degree) If the number is small, assume that these ERs give too fine classification and disable their discrimination ability Iterate step2 until the clusters become stable Experiments Data Game records of FIFA WorldCup 2002 (64 games, including all heats and finals) Number of goals: 168 (own goals excluded) Procedure Select series containing ‘IN GOAL’ event, and generate a total of 168 trajectories of 2-D ball location. For every possible pair of the trajectories, calculate dissimilarity by using multiscale matching. Group the trajectories by using the obtained dissimilarities and indiscernibility-based clustering Experimental Results Cluster Constitution Cluster Cases Cluster Cases 1 87 7 3 2 24 8 3 3 17 9 2 4 16 10 2 5 8 11 2 6 4 12 1 Note: 55.2% (7839/14196) of triplet in the dissimilarity matrix did not satisfy the triangular inequality due to matching failure Experimental Results (cont’d) Cluster 1 (87 cases) Corner Kick – Goal Matching Result IN GOAL Turkey vs Japan Europe: 45, South America: 24, Asia: 9 Italy vs Korea Experimental Results (cont’d) Cluster 2 (24 cases) Complex Pass – Side attack- Goal Matching Result IN GOAL Poland vs Portugal Europe: 13, South America: 7, Asia: 3 Germany vs Cameroon Experimental Results (cont’d) Cluster 4 (16 cases) Side Change – Centering/Dribble – Goal Matching Result IN GOAL Slovenia vs Paraguay Europe: 10, South America: 4, Africa: 2 China vs Turkey Experimental Results (cont’d) Cluster 3 (17 cases) Side Change – Centering/Dribble – Goal (Intermediate cases between Cluster 2 and 4) Europe: 10, South America: 2, Africa: 2 Asia 2 Summary of Experimental Results Goal success patterns can be classified into 4 major groups (with 8 minor patterns) Patterns: complexity of pass sequences With additional information Dribble/Centering/Side change: European Style However, the differences are not statistically significant. Key is “Side Change” Players (Defenders) should take care of the other side of the ball movement. The higher complexity of pass transactions, the higher rate of goal success gains by side change. Conclusions Presented a new scheme of spatio-temporal data mining Grouped similar patterns using multiscale comparison and indiscernibility-based clustering techniques. Visualized similar patterns using matching results. Application to real World Cup data: Grouping and visualization of interesting pass patterns: ex. Complex pass -> side attack -> goal Future Work Technical Issues Apply the proposed method to all path series including non‘IN GOAL’ series Numerical Evaluation Validation and improvement of segment dissimilarity measure; inclusion of event type to dissimilarity Differences between success and failure are very small. This suggests that the patterns of soccer attack are simple. Apply the proposed method to medical environment Trajectories of Laboratory Examinations (IEEE ICDM06) Trajectories of Patients’ Movement: Patient Safety Matching Criteria Criteria for determining the best set of segment pairs Complete match; original sequence should be correctly formed by concatenating the selected segments without any overlaps or gaps Overlap Gap Minimization of total segment difference P D( A, B) = d (a , b ) p =1 (0) p (0) p a1 a3 a4 A P : Number of matched segment pairs d (a (p0) , bp(0) ):dissimiarity of segments a (p0) , bp( 0) a2 a5 b1 B b2 b3 b4 b5 Matching Failure Problem in MSM Theoretically, any sequence can finally become a single segment at enough high scales. Therefore, any pair of sequences should be successfully matched. Practically, there should be an upper limit of scales in order to reduce computational complexity. Therefore, the number of segments can be different even at the highest scales. If matching is not successful, the method should return infinite dissimilarity or a magic value that indicates matching failure. match Scale n Scale 2 Scale 1 no-match Trajectory Mining Preprocessing Segmentation and Generation of Multiscale Trajectories Segment Hierarchy Trace and Matching Calculation of Dissimilarities Clustering of Trajectories