Download Model Answer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Benha University
Faculty of Engineering- Shoubra
Electrical Engineering Department
PhD (Electronics) CME 703
Summer Course Final Exam
Date: 31/8/ 2016
"Selected Topics in Electronic Eng."
Duration : 3 hours
Model Answer
1- Suppose your task as a software engineer at Big-University is to design a data mining system to
examine their university course database, which contains the following information: the name,
address, and status (e.g., undergraduate or graduate) of each student, the courses taken, and their
cumulative grade point average (GPA). Describe the architecture you would choose. What is the
purpose of each component of this architecture?
(10 marks)
A data mining architecture that can be used for this application would consist of the following
major components:
• A database, data warehouse, or other
information repository, which consists of
the set of databases, data warehouses,
spreadsheets, or other kinds of information
repositories containing the student and
course information.
• A database or data warehouse server,
which fetches the relevant data based on the
users’ data mining requests.
• A knowledge base that contains the
domain knowledge used to guide the search
or to evaluate the interestingness of
resulting patterns. For example, the
knowledge base may contain concept
hierarchies and metadata (e.g., describing
data from multiple heterogeneous sources).
• A data mining engine, which consists of a set of functional modules for tasks such as
classification, association, cluster analysis, and evolution and deviation analysis.
• A pattern evaluation module that works in tandem with the data mining modules by
employing interestingness measures to help focus the search towards interesting patterns.
• A graphical user interface that provides the user with an interactive approach to the data
mining system.
2- (a) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
(5 marks)
The steps involved in data mining when viewed as a process of knowledge discovery are as follows:
• Data cleaning, a process that removes or transforms noise and inconsistent data.
• Data integration, where multiple data sources may be combined.
• Data selection, where data relevant to the analysis task are retrieved from the database.
• Data transformation, where data are transformed or consolidated into forms appropriate for
mining.
• Data mining, an essential process where intelligent and efficient methods are applied in order to
extract patterns.
• Pattern evaluation, a process that identifies the truly interesting patterns representing knowledge
based on some interestingness measures.
• Knowledge presentation, where visualization and knowledge representation techniques are used to
1
present the mined knowledge to the user.
(b) List data mining techniques and destinguish between Predictive model and descriptive model.
(5 marks)
 Association rules
 Classification and prediction
 Clustering
 Deviation detection
 Similarity search
 Sequence Mining
Predictive model.
It is used to predict the values of data by making use of known results from a different set of sample
data. Data mining tasks that belongs to predictive model:
descriptive model.
It is used to determine the patterns and relationships in a sample data. Data mining tasks that belongs to
descriptive model:
(c) What is meant by pattern, List some applications of data mining.
(5 marks)
Pattern represents knowledge if it is easily understood by humans; valid on test data with some degree of
certainty; and potentially useful, novel, or validates a hunch about which the used was curious. Measures
of pattern interestingness, either objective or subjective, can be used to guide the discovery process.
Applications: Agriculture, biological data analysis, call record analysis, DSS, Business intelligence
system etc
(d) What is data warehouse, the benefits of data warehouse then Differentiate between Data Mining
and Data warehousing
(5 marks)
A data warehouse is a repository of multiple heterogeneous data sources organized under a unified
schema at a single site to facilitate management decision making.
(or) A data warehouse is a subject-oriented, time-variant and nonvolatile collection of data in support of
management’s decision-making process.
A data warehouse helps to integrate data and store them historically so that we can analyze different
aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use
the result of our analysis to improve the efficiency of business processes.


Data warehousing is merely extracting data from different sources, cleaning the data and
storing it in the warehouse.
Where as
Data mining aims to examine or explore the data using queries. These queries can be fired
on the data warehouse. Explore the data in data mining helps in reporting, planning strategies,
finding meaningful patterns.
3- (a) Write short notes on "On-line Transaction Processing" and "On-line Analytical Processing"

(5 marks)
OLTP (On-line Transaction Processing) is characterized by a large number of short on-line
transactions (INSERT, UPDATE, and DELETE). The main emphasis for OLTP systems is
put on very fast query processing, maintaining data integrity in multi-access environments
and an effectiveness measured by number of transactions per second. In OLTP database there
is detailed and current data, and schema used to store transactional databases is the entity
model (usually 3NF).
2

OLAP (On-line Analytical Processing) is characterized by relatively low volume of
transactions. Queries are often very complex and involve aggregations. For OLAP systems a
response time is an effectiveness measure. OLAP applications are widely used by Data
Mining techniques. In OLAP database there is aggregated, historical data, stored in multidimensional schemas (usually star schema).
(b) What are the characteristics of data warehouse, what meant by Fact table (5 marks)
 Integrated.
 Non-volatile.
 Subject oriented.
 Time variant.
(c) Show the operation of clustering algorithm.
(5 marks)
Clustering algorithm is used to group sets of data with similar characteristics also called as clusters. These
clusters help in making faster decisions, and exploring data. The algorithm first identifies relationships in
a dataset following which it generates a series of clusters based on the relationships. The process of
creating clusters is iterative. The algorithm redefines the groupings to create clusters that better represent
the data.
(d) What are the following shortcuts Tend to ( CURE, ETL and KDD )
(5 marks)
CURE
Clustering Using Representatives is called as CURE. The clustering algorithms generally work on
spherical and similar size clusters. CURE overcomes the problem of spherical and similar size
cluster and is more robust with respect to outliers.
ETL
ETL (extraction/transformation/loading) tools allow users to specify transforms through a
graphical user interface (GUI). These tools typically support only a restricted set of transforms so
that, often, we may also choose to write custom scripts for this step of the data cleaning process.
KDD
KDD is the abbreviation of Knowledge Discovery in Databases. It can be defined as the process
of finding useful information and patterns in data.
4- (a) Explain the types of data mining, show the importance of "Modeling" in data mining
(5 marks)
 Audio data mining
 Video data mining
 Image data mining
 Scientific and statistical data mining
Models in Data mining help the different algorithms in decision making or pattern matching. The second
stage of data mining involves considering various models and choosing the best one based on their predictive
performance.
(b) Imagine a research plan Combines Wireless sensor Networks with Data mining.
(5 marks)
By Student
Good Luck
BOARD OF EXAMINERS
Prof. Dr Mahmoud Mohanna
Dr. Moataz Elsherbini
3