Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu ICDM 2004 Business Meeting 11/4/2004 1 Data Mining on ICDM Submission Data 38 countries, 445 Submissions Regular Papers: 39 (9%) Short Papers: 66 (14.8%) High Acceptance Ratio (Regular) – Germany: – Finland: – USA: ICDM 2004 Business Meeting 11/4/2004 4/15 (26.7%) 2/ 9 (22.2%) 20/109 (18.3%) 2 Country Country Regular Short Total Ratio USA 20 28 109 44.0% China 3 4 55 12.7% UK 1 6 39 17.9% Japan 0 5 28 17.9% Canada 3 3 25 24.0% Taiwan 0 1 18 5.6% Australia 2 1 17 17.6% Germany 4 5 15 60.0% France 0 2 14 14.3% India 1 0 14 7.1% Singapore 0 3 12 25.0% Brazil 0 1 12 8.3% Italy 2 1 10 30.0% Finland 2 1 9 33.3% Spain 0 1 7 14.3% HongKong 1 1 6 33.3% 39 63 390 26.2% 39 66 445 23.8% Top 15 Total ICDM 2004 Business Meeting 11/4/2004 3 Data Mining on ICDM Submission Data Top 5 Areas of Submissions: – Data mining applications – Data mining and machine learning algorithms and methods – Mining text and semi-structured data, and mining temporal, spatial and multimedia data – Data pre-processing, data reduction, feature selection and feature transformation – Soft computing and uncertainty management for data mining High Acceptance Ratio Areas (Regular+Short) – Quality assessment and interestingness metrics of data mining results 5/10 50.0% – Data pre-processing, data reduction, feature selection and feature transformation 14/35 40.0% – Complexity, efficiency, and scalability issues in data mining 4/11 36.4% ICDM 2004 Business Meeting 11/4/2004 4 Regul ar Short Total Ratio Data mining applications 4 10 84 16.7% Data mining and machine learning algorithms and methods 9 20 81 35.8% Mining text and semi-structured data, and mining temporal, spatial and multimedia data 3 8 44 25.0% Data pre-processing, data reduction, feature selection and feature transformation 7 7 35 40.0% 3 34 8.8% Topic Soft computing and uncertainty management for data mining Topics Foundations of data mining 2 1 26 11.5% Mining data streams 3 4 25 28.0% 1 16 6.3% Human-machine interaction and visual data mining Security, privacy and social impact of data mining 2 1 15 20.0% Data and knowledge representation for data mining 1 1 12 16.7% 1 11 9.1% Pattern recognition and trend analysis Complexity, efficiency, and scalability issues in data mining 2 2 11 36.4% Quality assessment and interestingness metrics of data mining results 2 3 10 50.0% Statistics and probability in large-scale data mining 1 9 11.1% Integration of data warehousing, OLAP and data mining 1 9 11.1% Collaborative filtering/personalization 2 7 28.6% 1 7 28.6% Post-processing of data mining results 1 Others 2 6 33.3% High performance and parallel/distributed data mining 1 2 50.0% 1 0.0% 445 23.8% Query languages and user interfaces for mining Total 39 66 5 Corresponding Analysis (Country vs Final Decision) 2 r2=0.177 1.5 Slovenia Regular Finland Hong Kong Germany -2 -1.5 -1 USA 1 Italy 0.5 Canada 0 -0.5 -0.5 Short ICDM 2004 Business Meeting 11/4/2004 Australia India r1=0.378 Reject 0 UK 0.5 France Japan 1 1.5 -1 -1.5 6 Corresponding Analysis (Topics vs Final Decision) Applications r2=0.184 1.5 Collaborative Filtering 1 Short Reject DM Methods 0.5 Quality-assessment Soft-computing -1.5 -1 0 -0.5 -0.5 0 0.5 Preprocessing, Feature Selection 1 1.5 2 2.5 Security, privacy -1 Statistics and probability -1.5 -2 r1=0.280 Regular High-performance -2.5 ICDM 2004 Business Meeting 11/4/2004 -3 Post-processing 7 Corresponding Analysis Country vs Final Decision – Regular: Germany, USA – Short: ? – Reject: Most of the countries are located near this region. Topics vs Final Decision – Regular: Quality Assessment, Preprocessing/Feature Selection – Short: DM/ML Methods, Collaborative Filtering – Reject: DM Applications ICDM 2004 Business Meeting 11/4/2004 8 Rule Mining on ICDM Submission Data Datasets – Sample Size: 445 – Attributes: 5 • Paper No. : ordered by submission date • # of Authors • # of Characters in Title • Country • Category – Analyzed by Clementine 7.1 (and SPSS12.0J) ICDM 2004 Business Meeting 11/4/2004 9 Rule Mining (C5.0) on ICDM Submission Data C5.0 – [Topic=Mining semi-structured data,…] & [129< Paper No.<=369] => Reject (Confidence 0.87, Support 10) – [Country=USA] & [Topic=Mining semi-structured data,…] & [Paper No.>369] & [# of Authors <=3] =>Accept (Confidence 0.667, Support 3) – [Topic=Preprocessing/Feature Selection] & [# of Authors>4] => Accept (Confidence: 1.0, Support 3) – Topic, Paper No, # of Authors : Important Features ICDM 2004 Business Meeting 11/4/2004 10 Rule Mining (GRI) on ICDM Submission Data Generalized Rule Induction – [# of Authors <2] & [Paper No. <120.5] => Rejected (Confidence 96.0%, Support 24) – [# of Chars in Title< 27] & [Paper No. > 212] => Accepted (Confidence 100%, Support 5) Paper No., # of Chars in Title, # of Authors: Important Features ICDM 2004 Business Meeting 11/4/2004 11 Multidimensional Scaling (2004) 0.8 Country 0.6 0.4 Decision 0.2 Topics -1 Paper No. Review Score 0 -0.5 0 -0.2 0.5 1 1.5 # of Authors -0.4 # of Chars in Title -0.6 ICDM 2004 Business Meeting 11/4/2004 12 Summary (2004) of Mining on ICDM Submission Data Do not submit a paper too fast ! – Reflection not only on the contents, but also on the titles needed Mining Text/Web/Semi-structured Data are very popular. # of Application papers are growing now. (But, many: rejected) Strong Topics – Preprocessing/Feature-Selection – Postprocessing – Security and Privacy Several topics are emerging in ICDM2004: – Mining Data Streams – Collaborative Filtering – Quality Assessment ICDM 2004 Business Meeting 11/4/2004 13 5.00 1,176 1,169 4.00 3.00 score Comparison between 02-04 Review Scores: Box-plot 2.00 1.00 0.00 2002 ICDM 2004 Business Meeting 11/4/2004 2003 year 2004 14 Comparison between 02-04 Countries Country Acceptance Ratio (2002) Country Acceptance Ratio (2003) Country Acceptance Ratio (2004) Hong Kong 64.7% Israel 55.0% Germany 60.0% USA 47.9% Hong Kong 50.0% USA 44.0% Canada 45.5% Japan 37.0% Finland 33.0% Finland 33.3% USA 33.0% Hong Kong 33.0% France 33.3% Germany 32.0% Italy 30.0% ICDM 2004 Business Meeting 11/4/2004 15 Comparison between 02 and 04 Topics Top 5 in 2002 Acceptance Ratio Top 5 in 2003 Acceptance Ratio Top 5 in 2004 Acceptance Ratio Graph Mining 75.0% Processcentric DM 80.0% Quality Assessment Temporal Data 52.6% Security, privacy 57.0% Preprocessing, Feature Selection 40.0% Theory 42.9% Statistics and Probability 47.0% Complexity/Scalabil ity 36.4% Text Mining 42.1% Visual Data Mining 38.0% DM and ML Methods 35.8% Rule 41.7% Postprocessing 41.7% Collaborative Filtering 28.6% Post-processing 28.6% 50.0% 16 Multidimensional Scaling (2003 and 2004) 0.8 Topological structure w.r.t. similarities seems not to be changed in 2003 and 2004. Country0.6 0.4 Decision 0.2 -1 Topics -0.5 0 0 Paper No. Review 0.5 Score 1 1.5 2004 -0.2 -0.4 # of Authors Country0.8 # of Chars -0. in6Title 0.6 0.4 2003 Decision 0.2 Review Score Paper No Topics 0 -1 -0.5 0 -0.2 0.5 1 1.5 # of Authors -0.4 ICDM 2004 Business Meeting 11/4/2004 -0.6Title # of Chars in 17 Data Mining on ICDM Submission Data Acknowledgements – Many thanks to • PC chairs, Vice Chairs and PC members • All the authors • All the contributors to ICDM2004 – See you again in ICDM2005! ICDM 2004 Business Meeting 11/4/2004 18 Multidimensional Scaling (2004) 0.8 Country 0.6 0.4 Decision 0.2 Topics -1 Paper No. Review Score 0 -0.5 0 -0.2 0.5 1 1.5 # of Authors -0.4 # of Chars in Title -0.6 ICDM 2004 Business Meeting 11/4/2004 19 Multidimensional Scaling (2003) 0.8 Country 0.6 0. 4 Decision 0.2 -1 -0.5 Topics 0 -0.2 0 Review Score 0.5 # of Authors Paper No. 1 1.5 -0.4 # of Chars in Title ICDM 2004 Business Meeting 11/4/2004 -0.6 20