Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE APR2016 ASSESSMENT_CODE MC0088_APR2016 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5227 QUESTION_TEXT Distinguish the features between OLTP and OLAP SCHEME OF EVALUATION Users and sytem orientatioin: An OLTP system is customer oriented and is used for transaction and query used for transacrion and query processing by clerks,clients and information technology professionals. An OLAP system is market oriented and is usd for data analysis by knowledge workers, including managers, executives and analysts. (2 marks) Data contents: An OLTP system managers current data that typically are too detailed to be easily used for decision making. An OLTP system managers large amounts of historiacl data, provides facilities for summarization and aggregation and stores and managers information at different levels of granularity. These features make the data easier to use in informed decision making. (2 marks) Database design:An OLTP system usually adopts an entity relationship data model and an application oriented database design. An OLAP system typically adopts either a star or snowflake model and subject – oriented database design. (2 marks) View:An OLTP system focuses mainly on the current data within an enterprise or department without referring to historical data or data in different organizations. In contrast OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media. (2 marks) Access patterns: The access patterns of an OLTP system consists manily of short, atomic transactions. Such a system requires concurrency concurrency control and recovery mechanisms. However access to OLAP systems are mostly read only operations although many could be complex queries. (2 marks) (Total 10 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5229 QUESTION_TEXT What is Divisive clustering? Write algorithmic steps for the divisive clustering. SCHEME OF EVALUATION The variant of hierarchical clustering is called top-down clustering or divisive clustering.We start at the top with all documents in one cluster. the cluster is split using a flat clustering algorithm. This procedure is applied recursively until each document is in its own singleton cluster. (2 marks) Top down clustering is conceptually more complex than bottom –up clustering since we need a second, flat clustering algorithm as a subroutine. It has the advantage of being more efficient if we do not generate a complete hierarchy all the way down to individual document leaves.For fixed number of top levels, using an efficient flat algorithm like k-means, top down algorithms are linear in the number of documents and clusters. (3 marks) Algorithm: Divisive clustering starts by placing all objects into a single group. Before we start the procedure, we need to decide on a threshold distance. Once this is done then the procedure is (1 mark) 1.The distance between all the pairs of objects within the same group is determined and the pair with the largest distance is selected. (1 mark) 2.This maximum distance compared to the threshold distance. (2 marks) a.If it is larger than the threshold, this group is divided in two. This is done by placing the selected pair into different groups and using them as seed points. All other objects in this group are examined, and are placed into the new group with the closest seed point. The procedure then returns to step 1. (1 mark) b.If the distance between the selected objects is less than the threshold, the divisive clustering stops. (Total 10 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5231 QUESTION_TEXT Explain the data warehouse models from the architecture point of view. SCHEME OF EVALUATION Enterprise warehouse: An enterprise warehouse collects all of the information about subjects spanning the entire organization. It provides corporate wide data integration, usually from one or more operational systems or external information providers, and is cross - functional in scope. It typically contains detailed data as well as summarized data, and can range in size from a few gigabytes to hundreds of gigabytes, terabytes or beyond. An enterprise data warehouse may be implemented on traditional mainframes, UNIX superservers, or parallel architecture platforms. It requires extensive business modeling and may take years to design and build. (3.5 marks) Data mart: A data mart contains a subset of corporate — wide data that is of value to a specific group of users. The scope is confined to specific selected subjects. For example, a marketing data mart may confine its subjects to customer, item, and sales. The data contained in data marts tend to be summarized. Depending on the sources of data, data marts can be categorized as independent or dependent. Independent data marts are sourced from data captured from one or more operational systems or external information providers, or from data generated locally within a particular department or geographic area. Dependent data marts are sourced directly from enterprise data warehouses. (3.5 marks) Virtual warehouse: A virtual warehouse is a set of views over operational databases. For efficient query processing, only some of the possible summary views may be materialized. A virtual warehouse is easy to build but requires excess capacity on operational database servers. (3 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117789 QUESTION_TEXT What is data cleaning? Explain missing values method for data cleaning. Data cleaning routines attempt to fill in missing values, smooth out noise while identifying outlines, and correct inconsistencies in the data. (2 marks) Missing value Methods are: SCHEME OF EVALUATION 1. Ignore the tuple 2. Fill in the missing value immediately 3. Use a global constant to fill in the missing value (4 marks) 4. Use the attribute value to fill in the missing value 5. Use the attribute mean for all samples belonging to the same class as the given tuple 6. Use the most probable value to fill in the missing value (4 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117793 QUESTION_TEXT SCHEME OF EVALUATION List and explain the various criteria used to compare the classification methods. Predictive accuracy Speed Robustness Scalability Interpretability 5×2=10 marks QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117794 QUESTION_TEXT List and explain the web content mining challenges. Data/Information extraction Web information integration and schema matching Opinion extraction from online sources Knowledge synthesis Segmenting web pages and detecting noise SCHEME OF EVALUATION 5×2=10 marks