Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Benha University Faculty of Engineering- Shoubra Electrical Engineering Department PhD (Electronics) CME 703 Summer Course Final Exam Date: 31/8/ 2016 "Selected Topics in Electronic Eng." Duration : 3 hours Model Answer 1- Suppose your task as a software engineer at Big-University is to design a data mining system to examine their university course database, which contains the following information: the name, address, and status (e.g., undergraduate or graduate) of each student, the courses taken, and their cumulative grade point average (GPA). Describe the architecture you would choose. What is the purpose of each component of this architecture? (10 marks) A data mining architecture that can be used for this application would consist of the following major components: • A database, data warehouse, or other information repository, which consists of the set of databases, data warehouses, spreadsheets, or other kinds of information repositories containing the student and course information. • A database or data warehouse server, which fetches the relevant data based on the users’ data mining requests. • A knowledge base that contains the domain knowledge used to guide the search or to evaluate the interestingness of resulting patterns. For example, the knowledge base may contain concept hierarchies and metadata (e.g., describing data from multiple heterogeneous sources). • A data mining engine, which consists of a set of functional modules for tasks such as classification, association, cluster analysis, and evolution and deviation analysis. • A pattern evaluation module that works in tandem with the data mining modules by employing interestingness measures to help focus the search towards interesting patterns. • A graphical user interface that provides the user with an interactive approach to the data mining system. 2- (a) Describe the steps involved in data mining when viewed as a process of knowledge discovery. (5 marks) The steps involved in data mining when viewed as a process of knowledge discovery are as follows: • Data cleaning, a process that removes or transforms noise and inconsistent data. • Data integration, where multiple data sources may be combined. • Data selection, where data relevant to the analysis task are retrieved from the database. • Data transformation, where data are transformed or consolidated into forms appropriate for mining. • Data mining, an essential process where intelligent and efficient methods are applied in order to extract patterns. • Pattern evaluation, a process that identifies the truly interesting patterns representing knowledge based on some interestingness measures. • Knowledge presentation, where visualization and knowledge representation techniques are used to 1 present the mined knowledge to the user. (b) List data mining techniques and destinguish between Predictive model and descriptive model. (5 marks) Association rules Classification and prediction Clustering Deviation detection Similarity search Sequence Mining Predictive model. It is used to predict the values of data by making use of known results from a different set of sample data. Data mining tasks that belongs to predictive model: descriptive model. It is used to determine the patterns and relationships in a sample data. Data mining tasks that belongs to descriptive model: (c) What is meant by pattern, List some applications of data mining. (5 marks) Pattern represents knowledge if it is easily understood by humans; valid on test data with some degree of certainty; and potentially useful, novel, or validates a hunch about which the used was curious. Measures of pattern interestingness, either objective or subjective, can be used to guide the discovery process. Applications: Agriculture, biological data analysis, call record analysis, DSS, Business intelligence system etc (d) What is data warehouse, the benefits of data warehouse then Differentiate between Data Mining and Data warehousing (5 marks) A data warehouse is a repository of multiple heterogeneous data sources organized under a unified schema at a single site to facilitate management decision making. (or) A data warehouse is a subject-oriented, time-variant and nonvolatile collection of data in support of management’s decision-making process. A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes. Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as Data mining aims to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns. 3- (a) Write short notes on "On-line Transaction Processing" and "On-line Analytical Processing" (5 marks) OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, and DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). 2 OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multidimensional schemas (usually star schema). (b) What are the characteristics of data warehouse, what meant by Fact table (5 marks) Integrated. Non-volatile. Subject oriented. Time variant. (c) Show the operation of clustering algorithm. (5 marks) Clustering algorithm is used to group sets of data with similar characteristics also called as clusters. These clusters help in making faster decisions, and exploring data. The algorithm first identifies relationships in a dataset following which it generates a series of clusters based on the relationships. The process of creating clusters is iterative. The algorithm redefines the groupings to create clusters that better represent the data. (d) What are the following shortcuts Tend to ( CURE, ETL and KDD ) (5 marks) CURE Clustering Using Representatives is called as CURE. The clustering algorithms generally work on spherical and similar size clusters. CURE overcomes the problem of spherical and similar size cluster and is more robust with respect to outliers. ETL ETL (extraction/transformation/loading) tools allow users to specify transforms through a graphical user interface (GUI). These tools typically support only a restricted set of transforms so that, often, we may also choose to write custom scripts for this step of the data cleaning process. KDD KDD is the abbreviation of Knowledge Discovery in Databases. It can be defined as the process of finding useful information and patterns in data. 4- (a) Explain the types of data mining, show the importance of "Modeling" in data mining (5 marks) Audio data mining Video data mining Image data mining Scientific and statistical data mining Models in Data mining help the different algorithms in decision making or pattern matching. The second stage of data mining involves considering various models and choosing the best one based on their predictive performance. (b) Imagine a research plan Combines Wireless sensor Networks with Data mining. (5 marks) By Student Good Luck BOARD OF EXAMINERS Prof. Dr Mahmoud Mohanna Dr. Moataz Elsherbini 3