Download 2008 Semester 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia , lookup

Theoretical computer science wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
W210/
DUBLIN INSTITUTE OF TECHNOLOGY
KEVIN STREET, DUBLIN 8
____________________
MSc in Information Technology
____________________
SEMESTER 1 EXAMINATIONS 2008
____________________
BUSINESS SYSTEMS INTELLIGENCE
Dr. B. Mac Namee
Prof. B. O’Shea
Mr.B.Chadwick
Time Allowed: 2 hours
Attempt any two questions.
All questions carry equal marks.
W210/
1.
(a)
Explain how each of the following issues can affect classification and
suggest methods for dealing with each of them.
(i) Missing values
(7 marks)
(ii) The curse of dimensionality
(7 marks)
(iii) Over-fitting
(7 marks)
(b)
The MistressCard credit card processing company wants to build a
classification system to help combat credit card fraud. Based on a set of
features describing a transaction (such as amount, date, location etc) the
system should be able to classify transactions into those that are genuine and
those that are fraudulent. A large set of historical labelled data is available
for training the system, although there are a large number of missing values
in the data due to data entry problems.
The system must be as accurate as possible and will be used to perform realtime checks on every transaction the company processes. The company
would like to frequently update the system based on new examples of
fraudulent activity as they arise.
(i) Compare the suitability to this task of any three classification
techniques with which you are familiar. Suggest which one would be
the most appropriate.
(16 marks)
(ii) Suggest an appropriate approach to measuring the performance of a
classification system developed for the problem described in part (i).
(13 marks)
Page 2 of 3
W210/
2. (a) What is a data warehouse readiness assessment and why is it important?
(5 marks)
(b) Describe how star schemas can be used for dimensional modelling.
(10 marks)
(c) Bill Inmon proposes that data in a data warehouse have four properties.
Discuss these properties, illustrating each with an example and an appropriate
diagram.
(20 marks)
(d) Compare and contrast on-line transaction processing (OLTP) and on-line
analytical processing (OLAP).
(15 marks)
3.
(a) Describe the important properties of a general association rule mining
algorithm.
(10 marks)
(b) It has been said that organizations are “drowning in data, but starving for
knowledge”. Using appropriate examples describe how business systems
intelligence solutions attempt to address this situation. Conclude your
answer with a discussion of the major challenges currently facing business
systems intelligence practitioners.
(25 marks)
(c) Focusing on the CRISP-DM methodology, explore the use of standardised
methodologies for data mining projects.
(15 marks)
Page 3 of 3