Download Knowledge Discovery and Data Mining Applied to Engineering Applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Knowledge Discovery and
Data Mining Applied to
Engineering Applications
Shirley Williams
The University of Reading
25th September 2001
(c) Shirley Williams, 2001
1
Overview
„
The process
„
„
„
„
Understanding the problem
Collecting and preparing the data
Exploring the data -> Results
Modelling -> Knowledge
Iterative
Example using telecomm data
25th September 2001
(c) Shirley Williams, 2001
2
Knowledge Discovery with
Engineering Data
What we want:
„ Find anomalous trends
„ To get answers to questions related to
engineering issues
As opposed to...
Marketing analysis and fraud detection
the traditional uses in telecommunications
25th September 2001
(c) Shirley Williams, 2001
3
An Investigation Oriented KD Method
understand
the
problem
knowledge
collect
and prepare
data
build
and test
models
explore data
and design
experiments
•
Database
•
Spreadsheet
•
Data Mining
results
25th September 2001
(c) Shirley Williams, 2001
4
Understanding the Problem
EOS (End of Selection)
„
„
EOS codes are raised to indicate a call related
event
Providing immediate information:
„
„
Call progress
Why a call has failed
„
Dropped, Set up problems or Other
“Access is easy but analysis techniques are
complex with many EOS to focus on and
many mobile switches to analyse.”
25th September 2001
(c) Shirley Williams, 2001
5
Understanding the Problem
EOS – Mobile to Mobile
MSC
MSC
BSC
BSC
Target
Originator
25th September 2001
(c) Shirley Williams, 2001
6
Understanding the Problem
Potential Gains
„
Nearly 2 million EOS events associated
with problems at call set up
„
Several may be associated with a single
unsuccessful call
„
„
Quarter million unhappy customers?
Helping their calls succeed may stop them
churning
25th September 2001
(c) Shirley Williams, 2001
7
Collecting and Preparing the Data
First Steps
„
„
„
„
„
Accessing the data
Accessing the right databases
Understanding the fields
Getting the engineers to classify the
EOS codes
Creating summative data
25th September 2001
(c) Shirley Williams, 2001
8
Collecting and Preparing the Data
Magnitude of the Problem
„
„
100 million plus calls per day
Over 10 million EOS events
„
Many of these are subscriber busy
„
„
We can’t fix that
But marketing may be able to sell them call
waiting
25th September 2001
(c) Shirley Williams, 2001
9
Exploring
Calls at Bath on a Thursday
120000
100000
80000
60000
40000
20000
25th September 2001
9:
00
10
:0
0
11
:0
0
12
:0
0
13
:0
0
14
:0
0
15
:0
0
16
:0
0
17
:0
0
18
:0
0
19
:0
0
20
:0
0
21
:0
0
22
:0
0
23
:0
0
8:
00
7:
00
5:
00
6:
00
4:
00
3:
00
1:
00
2:
00
0:
00
0
(c) Shirley Williams, 2001
10
25th September 2001
10
:0
0
11
:0
0
12
:0
0
13
:0
0
14
:0
0
15
:0
0
16
:0
0
17
:0
0
18
:0
0
19
:0
0
20
:0
0
21
:0
0
22
:0
0
23
:0
0
9:
00
8:
00
7:
00
6:
00
5:
00
4:
00
3:
00
2:
00
1:
00
0:
00
Exploring
EOS 33
4000
3500
3000
2500
2000
1500
1000
500
0
(c) Shirley Williams, 2001
11
Exploring
Ratio EOS 33 to Calls
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
25th September 2001
9:
00
10
:0
0
11
:0
0
12
:0
0
13
:0
0
14
:0
0
15
:0
0
16
:0
0
17
:0
0
18
:0
0
19
:0
0
20
:0
0
21
:0
0
22
:0
0
23
:0
0
8:
00
7:
00
6:
00
5:
00
4:
00
3:
00
2:
00
1:
00
0:
00
0
(c) Shirley Williams, 2001
12
25th September 2001
10
:0
0
11
:0
0
12
:0
0
13
:0
0
14
:0
0
15
:0
0
16
:0
0
17
:0
0
18
:0
0
19
:0
0
20
:0
0
21
:0
0
22
:0
0
23
:0
0
9:
00
8:
00
7:
00
6:
00
5:
00
4:
00
3:
00
2:
00
1:
00
0:
00
Exploring
EOS 160
60
50
40
30
20
10
0
(c) Shirley Williams, 2001
13
Exploring
Ratio EOS 160 to Calls
0.0007
0.0006
0.0005
0.0004
0.0003
0.0002
0.0001
25th September 2001
9:
00
10
:0
0
11
:0
0
12
:0
0
13
:0
0
14
:0
0
15
:0
0
16
:0
0
17
:0
0
18
:0
0
19
:0
0
20
:0
0
21
:0
0
22
:0
0
23
:0
0
8:
00
7:
00
6:
00
5:
00
4:
00
3:
00
2:
00
1:
00
0:
00
0
(c) Shirley Williams, 2001
14
Exploring
Results
„
„
There are different peaks for calls and
some EOS codes
Ratios of EOS to Calls vary from place
to place and code to code
25th September 2001
(c) Shirley Williams, 2001
15
Exploring
Experiment Design
Investigate what leads to set up problems
„ Do high values of EOS codes match
high values of Calls?
„ Do some places have more set up
problems than others?
„ Are some times better than others?
„ Are some days better than others?
25th September 2001
(c) Shirley Williams, 2001
16
Discovery
„
The Health of a switch can be
represented numerically as:
„
„
„
The ratio of occurrence of certain EOS
codes to the number of calls
The ratio of combinations of EOS codes to
the number of calls
Other quantities
25th September 2001
(c) Shirley Williams, 2001
17
Modelling
Running Experiments
„
Data Mining then used with targets of
good health to identify particularly poor
and good
„
„
„
Places
Dates
Times
25th September 2001
(c) Shirley Williams, 2001
18
Modelling
Approaches
Many approaches where tried using SAS
Enterprise Miner TM, using different
parameters, including:
„ Regression
„ Neural Networks
„ Decision Trees
It was a great help having a statistician on the team
25th September 2001
(c) Shirley Williams, 2001
19
Modelling
Results
The Engineers preferred the results they
could easily understand
„ Decision Trees
But similar indications where found using
other techniques
25th September 2001
(c) Shirley Williams, 2001
20
Where Next?
„
„
The Engineers can then carry out a
detailed analysis of why certain, dates,
times and places are unhealthy
Further iterations of the process need
to be undertaken to answer new
questions
25th September 2001
(c) Shirley Williams, 2001
21
Conclusions
„
„
„
„
Knowledge Discovery is applicable to
engineering data
Lots of results can be found using
simple tools and techniques
Deeper knowledge can be found with
advanced techniques
The knowledge needs to be acted upon
25th September 2001
(c) Shirley Williams, 2001
22