Download ICDM04 Submission Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining
on ICDM Submission Data
Shusaku Tsumoto
Ning Zhong and Xindong Wu
ICDM 2004 Business Meeting 11/4/2004
1
Data Mining
on ICDM Submission Data

38 countries, 445 Submissions
Regular Papers: 39 (9%)
Short Papers: 66 (14.8%)

High Acceptance Ratio (Regular)


– Germany:
– Finland:
– USA:
ICDM 2004 Business Meeting 11/4/2004
4/15 (26.7%)
2/ 9
(22.2%)
20/109 (18.3%)
2
Country
Country
Regular
Short
Total
Ratio
USA
20
28
109
44.0%
China
3
4
55
12.7%
UK
1
6
39
17.9%
Japan
0
5
28
17.9%
Canada
3
3
25
24.0%
Taiwan
0
1
18
5.6%
Australia
2
1
17
17.6%
Germany
4
5
15
60.0%
France
0
2
14
14.3%
India
1
0
14
7.1%
Singapore
0
3
12
25.0%
Brazil
0
1
12
8.3%
Italy
2
1
10
30.0%
Finland
2
1
9
33.3%
Spain
0
1
7
14.3%
HongKong
1
1
6
33.3%
39
63
390
26.2%
39
66
445
23.8%
Top 15
Total
ICDM 2004 Business Meeting 11/4/2004
3
Data Mining
on ICDM Submission Data

Top 5 Areas of Submissions:
– Data mining applications
– Data mining and machine learning algorithms and methods
– Mining text and semi-structured data, and mining temporal, spatial and multimedia
data
– Data pre-processing, data reduction, feature selection and feature transformation
– Soft computing and uncertainty management for data mining

High Acceptance Ratio Areas (Regular+Short)
– Quality assessment and interestingness metrics of data mining results
5/10
50.0%
– Data pre-processing, data reduction, feature selection and feature
transformation
14/35
40.0%
– Complexity, efficiency, and scalability issues in data mining
4/11
36.4%
ICDM 2004 Business Meeting 11/4/2004
4
Regul
ar
Short
Total
Ratio
Data mining applications
4
10
84
16.7%
Data mining and machine learning algorithms and methods
9
20
81
35.8%
Mining text and semi-structured data, and mining temporal,
spatial and multimedia data
3
8
44
25.0%
Data pre-processing, data reduction, feature selection and
feature transformation
7
7
35
40.0%
3
34
8.8%
Topic
Soft computing and uncertainty management for data mining
Topics
Foundations of data mining
2
1
26
11.5%
Mining data streams
3
4
25
28.0%
1
16
6.3%
Human-machine interaction and visual data mining
Security, privacy and social impact of data mining
2
1
15
20.0%
Data and knowledge representation for data mining
1
1
12
16.7%
1
11
9.1%
Pattern recognition and trend analysis
Complexity, efficiency, and scalability issues in data mining
2
2
11
36.4%
Quality assessment and interestingness metrics of data mining
results
2
3
10
50.0%
Statistics and probability in large-scale data mining
1
9
11.1%
Integration of data warehousing, OLAP and data mining
1
9
11.1%
Collaborative filtering/personalization
2
7
28.6%
1
7
28.6%
Post-processing of data mining results
1
Others
2
6
33.3%
High performance and parallel/distributed data mining
1
2
50.0%
1
0.0%
445
23.8%
Query languages and user interfaces for mining
Total
39
66
5
Corresponding Analysis
(Country vs Final Decision)
2
r2=0.177
1.5 Slovenia
Regular
Finland
Hong Kong
Germany
-2
-1.5
-1
USA
1
Italy
0.5 Canada
0
-0.5
-0.5
Short
ICDM 2004 Business Meeting 11/4/2004
Australia India
r1=0.378
Reject
0 UK
0.5
France
Japan
1
1.5
-1
-1.5
6
Corresponding Analysis
(Topics vs Final Decision)
Applications
r2=0.184
1.5
Collaborative Filtering
1
Short
Reject
DM Methods
0.5
Quality-assessment
Soft-computing
-1.5
-1
0
-0.5
-0.5
0
0.5
Preprocessing, Feature Selection
1
1.5
2
2.5
Security, privacy
-1
Statistics and probability
-1.5
-2
r1=0.280
Regular
High-performance
-2.5
ICDM 2004 Business Meeting 11/4/2004 -3
Post-processing
7
Corresponding Analysis

Country vs Final Decision
– Regular: Germany, USA
– Short: ?
– Reject: Most of the countries are located near this region.

Topics vs Final Decision
– Regular: Quality Assessment,
Preprocessing/Feature Selection
– Short: DM/ML Methods, Collaborative Filtering
– Reject: DM Applications
ICDM 2004 Business Meeting 11/4/2004
8
Rule Mining
on ICDM Submission Data

Datasets
– Sample Size: 445
– Attributes: 5
• Paper No. : ordered by submission date
• # of Authors
• # of Characters in Title
• Country
• Category
– Analyzed by Clementine 7.1 (and SPSS12.0J)
ICDM 2004 Business Meeting 11/4/2004
9
Rule Mining (C5.0)
on ICDM Submission Data

C5.0
– [Topic=Mining semi-structured data,…] & [129< Paper No.<=369]
=> Reject (Confidence 0.87, Support 10)
– [Country=USA] & [Topic=Mining semi-structured data,…] &
[Paper No.>369] & [# of Authors <=3]
=>Accept (Confidence 0.667, Support 3)
– [Topic=Preprocessing/Feature Selection] & [# of Authors>4]
=> Accept (Confidence: 1.0, Support 3)
– Topic, Paper No, # of Authors : Important Features
ICDM 2004 Business Meeting 11/4/2004
10
Rule Mining (GRI)
on ICDM Submission Data

Generalized Rule Induction
– [# of Authors <2] & [Paper No. <120.5]
=> Rejected (Confidence 96.0%, Support 24)
– [# of Chars in Title< 27] & [Paper No. > 212]
=> Accepted (Confidence 100%, Support 5)

Paper No., # of Chars in Title, # of Authors: Important Features
ICDM 2004 Business Meeting 11/4/2004
11
Multidimensional Scaling
(2004)
0.8
Country
0.6
0.4
Decision
0.2
Topics
-1
Paper No.
Review Score
0
-0.5
0
-0.2
0.5
1
1.5
# of Authors
-0.4
# of Chars in Title
-0.6
ICDM 2004 Business Meeting 11/4/2004
12
Summary (2004) of Mining
on ICDM Submission Data




Do not submit a paper too fast !
– Reflection not only on the contents, but also on the titles needed
Mining Text/Web/Semi-structured Data are very popular.
# of Application papers are growing now. (But, many: rejected)
Strong Topics
– Preprocessing/Feature-Selection
– Postprocessing
– Security and Privacy

Several topics are emerging in ICDM2004:
– Mining Data Streams
– Collaborative Filtering
– Quality Assessment
ICDM 2004 Business Meeting 11/4/2004
13
5.00
1,176
1,169
4.00
3.00
score
Comparison
between 02-04
Review Scores:
Box-plot
2.00
1.00
0.00
2002
ICDM 2004 Business Meeting 11/4/2004
2003
year
2004
14
Comparison between 02-04
Countries
Country
Acceptance
Ratio (2002)
Country
Acceptance
Ratio (2003)
Country
Acceptance
Ratio (2004)
Hong Kong
64.7% Israel
55.0% Germany
60.0%
USA
47.9% Hong Kong
50.0% USA
44.0%
Canada
45.5% Japan
37.0% Finland
33.0%
Finland
33.3% USA
33.0% Hong Kong
33.0%
France
33.3% Germany
32.0% Italy
30.0%
ICDM 2004 Business Meeting 11/4/2004
15
Comparison between 02 and 04
Topics
Top 5
in 2002
Acceptance
Ratio
Top 5
in 2003
Acceptance
Ratio
Top 5
in 2004
Acceptance
Ratio
Graph
Mining
75.0%
Processcentric DM
80.0% Quality Assessment
Temporal
Data
52.6%
Security,
privacy
57.0%
Preprocessing,
Feature Selection
40.0%
Theory
42.9%
Statistics and
Probability
47.0%
Complexity/Scalabil
ity
36.4%
Text
Mining
42.1%
Visual Data
Mining
38.0%
DM and ML
Methods
35.8%
Rule
41.7%
Postprocessing
41.7%
Collaborative
Filtering
28.6%
Post-processing
28.6%
50.0%
16
Multidimensional Scaling
(2003 and 2004)
0.8
Topological structure w.r.t. similarities
seems not to be changed in 2003
and 2004.
Country0.6
0.4
Decision
0.2
-1
Topics
-0.5
0
0
Paper No.
Review
0.5 Score 1
1.5
2004
-0.2
-0.4
# of Authors
Country0.8
# of Chars -0.
in6Title
0.6
0.4
2003
Decision
0.2
Review Score Paper No
Topics 0
-1
-0.5
0
-0.2
0.5
1
1.5
# of Authors
-0.4
ICDM 2004 Business Meeting 11/4/2004
-0.6Title
# of Chars in
17
Data Mining
on ICDM Submission Data

Acknowledgements
– Many thanks to
• PC chairs, Vice Chairs and PC
members
• All the authors
• All the contributors to ICDM2004
– See you again in ICDM2005!
ICDM 2004 Business Meeting 11/4/2004
18
Multidimensional Scaling
(2004)
0.8
Country
0.6
0.4
Decision
0.2
Topics
-1
Paper No.
Review Score
0
-0.5
0
-0.2
0.5
1
1.5
# of Authors
-0.4
# of Chars in Title
-0.6
ICDM 2004 Business Meeting 11/4/2004
19
Multidimensional Scaling
(2003)
0.8
Country
0.6
0.
4
Decision
0.2
-1
-0.5
Topics
0
-0.2
0
Review Score
0.5
# of Authors
Paper No.
1
1.5
-0.4
# of Chars in Title
ICDM 2004 Business Meeting 11/4/2004
-0.6
20
Related documents