Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge Discovery and Data Mining Applied to Engineering Applications Shirley Williams The University of Reading 25th September 2001 (c) Shirley Williams, 2001 1 Overview The process Understanding the problem Collecting and preparing the data Exploring the data -> Results Modelling -> Knowledge Iterative Example using telecomm data 25th September 2001 (c) Shirley Williams, 2001 2 Knowledge Discovery with Engineering Data What we want: Find anomalous trends To get answers to questions related to engineering issues As opposed to... Marketing analysis and fraud detection the traditional uses in telecommunications 25th September 2001 (c) Shirley Williams, 2001 3 An Investigation Oriented KD Method understand the problem knowledge collect and prepare data build and test models explore data and design experiments • Database • Spreadsheet • Data Mining results 25th September 2001 (c) Shirley Williams, 2001 4 Understanding the Problem EOS (End of Selection) EOS codes are raised to indicate a call related event Providing immediate information: Call progress Why a call has failed Dropped, Set up problems or Other “Access is easy but analysis techniques are complex with many EOS to focus on and many mobile switches to analyse.” 25th September 2001 (c) Shirley Williams, 2001 5 Understanding the Problem EOS – Mobile to Mobile MSC MSC BSC BSC Target Originator 25th September 2001 (c) Shirley Williams, 2001 6 Understanding the Problem Potential Gains Nearly 2 million EOS events associated with problems at call set up Several may be associated with a single unsuccessful call Quarter million unhappy customers? Helping their calls succeed may stop them churning 25th September 2001 (c) Shirley Williams, 2001 7 Collecting and Preparing the Data First Steps Accessing the data Accessing the right databases Understanding the fields Getting the engineers to classify the EOS codes Creating summative data 25th September 2001 (c) Shirley Williams, 2001 8 Collecting and Preparing the Data Magnitude of the Problem 100 million plus calls per day Over 10 million EOS events Many of these are subscriber busy We can’t fix that But marketing may be able to sell them call waiting 25th September 2001 (c) Shirley Williams, 2001 9 Exploring Calls at Bath on a Thursday 120000 100000 80000 60000 40000 20000 25th September 2001 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 8: 00 7: 00 5: 00 6: 00 4: 00 3: 00 1: 00 2: 00 0: 00 0 (c) Shirley Williams, 2001 10 25th September 2001 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 9: 00 8: 00 7: 00 6: 00 5: 00 4: 00 3: 00 2: 00 1: 00 0: 00 Exploring EOS 33 4000 3500 3000 2500 2000 1500 1000 500 0 (c) Shirley Williams, 2001 11 Exploring Ratio EOS 33 to Calls 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 25th September 2001 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 8: 00 7: 00 6: 00 5: 00 4: 00 3: 00 2: 00 1: 00 0: 00 0 (c) Shirley Williams, 2001 12 25th September 2001 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 9: 00 8: 00 7: 00 6: 00 5: 00 4: 00 3: 00 2: 00 1: 00 0: 00 Exploring EOS 160 60 50 40 30 20 10 0 (c) Shirley Williams, 2001 13 Exploring Ratio EOS 160 to Calls 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 25th September 2001 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 8: 00 7: 00 6: 00 5: 00 4: 00 3: 00 2: 00 1: 00 0: 00 0 (c) Shirley Williams, 2001 14 Exploring Results There are different peaks for calls and some EOS codes Ratios of EOS to Calls vary from place to place and code to code 25th September 2001 (c) Shirley Williams, 2001 15 Exploring Experiment Design Investigate what leads to set up problems Do high values of EOS codes match high values of Calls? Do some places have more set up problems than others? Are some times better than others? Are some days better than others? 25th September 2001 (c) Shirley Williams, 2001 16 Discovery The Health of a switch can be represented numerically as: The ratio of occurrence of certain EOS codes to the number of calls The ratio of combinations of EOS codes to the number of calls Other quantities 25th September 2001 (c) Shirley Williams, 2001 17 Modelling Running Experiments Data Mining then used with targets of good health to identify particularly poor and good Places Dates Times 25th September 2001 (c) Shirley Williams, 2001 18 Modelling Approaches Many approaches where tried using SAS Enterprise Miner TM, using different parameters, including: Regression Neural Networks Decision Trees It was a great help having a statistician on the team 25th September 2001 (c) Shirley Williams, 2001 19 Modelling Results The Engineers preferred the results they could easily understand Decision Trees But similar indications where found using other techniques 25th September 2001 (c) Shirley Williams, 2001 20 Where Next? The Engineers can then carry out a detailed analysis of why certain, dates, times and places are unhealthy Further iterations of the process need to be undertaken to answer new questions 25th September 2001 (c) Shirley Williams, 2001 21 Conclusions Knowledge Discovery is applicable to engineering data Lots of results can be found using simple tools and techniques Deeper knowledge can be found with advanced techniques The knowledge needs to be acted upon 25th September 2001 (c) Shirley Williams, 2001 22