Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
W210/ DUBLIN INSTITUTE OF TECHNOLOGY KEVIN STREET, DUBLIN 8 ____________________ MSc in Information Technology ____________________ SEMESTER 1 EXAMINATIONS 2008 ____________________ BUSINESS SYSTEMS INTELLIGENCE Dr. B. Mac Namee Prof. B. O’Shea Mr.B.Chadwick Time Allowed: 2 hours Attempt any two questions. All questions carry equal marks. W210/ 1. (a) Explain how each of the following issues can affect classification and suggest methods for dealing with each of them. (i) Missing values (7 marks) (ii) The curse of dimensionality (7 marks) (iii) Over-fitting (7 marks) (b) The MistressCard credit card processing company wants to build a classification system to help combat credit card fraud. Based on a set of features describing a transaction (such as amount, date, location etc) the system should be able to classify transactions into those that are genuine and those that are fraudulent. A large set of historical labelled data is available for training the system, although there are a large number of missing values in the data due to data entry problems. The system must be as accurate as possible and will be used to perform realtime checks on every transaction the company processes. The company would like to frequently update the system based on new examples of fraudulent activity as they arise. (i) Compare the suitability to this task of any three classification techniques with which you are familiar. Suggest which one would be the most appropriate. (16 marks) (ii) Suggest an appropriate approach to measuring the performance of a classification system developed for the problem described in part (i). (13 marks) Page 2 of 3 W210/ 2. (a) What is a data warehouse readiness assessment and why is it important? (5 marks) (b) Describe how star schemas can be used for dimensional modelling. (10 marks) (c) Bill Inmon proposes that data in a data warehouse have four properties. Discuss these properties, illustrating each with an example and an appropriate diagram. (20 marks) (d) Compare and contrast on-line transaction processing (OLTP) and on-line analytical processing (OLAP). (15 marks) 3. (a) Describe the important properties of a general association rule mining algorithm. (10 marks) (b) It has been said that organizations are “drowning in data, but starving for knowledge”. Using appropriate examples describe how business systems intelligence solutions attempt to address this situation. Conclude your answer with a discussion of the major challenges currently facing business systems intelligence practitioners. (25 marks) (c) Focusing on the CRISP-DM methodology, explore the use of standardised methodologies for data mining projects. (15 marks) Page 3 of 3